If you are thinking about using HBase you will likely want to understand HBase backup options. I know we did, so let us share what we found. Please let us know what we missed and what you use for HBase backup!
Export
You could export your tables using the Export (org.apache.hadoop.hbase.mapreduce.Export) MapReduce job that will export the table data into a Sequence File on HDFS. This was implemented in HBASE-1684 if you want to check out the patch or comments there. This tool works on one table at a time, so if you need to backup multiple tables, run this on each table. The exported data can then be imported back into HBase by the Import tool.
Copy Table
If you have another HBase cluster that you want to treat as a backup cluster, you can use the handy CopyTable tool to copy a table at a time.
Distcp
You could use Hadoop’s distcp command to copy the whole /hbase directory from one HDFS cluster to the other. However, this can leave your data in an inconsistent state, so it should be avoided. See http://search-hadoop.com/m/wkMgSjVLDb
At this point we should point out that all of the above backup methods are per-table. Moreover, they don’t work or create a snapshot of the table. Export and CopyTable are atomic only at the row level. Furthermore, if you have multiple tables whose tables depend on each other, if they are being modified while you are exporting or copying them, you will end up with inconsistent data – the data in those tables will not be in sync. See http://search-hadoop.com/m/Q4bU81G116p.
Backup from Mozilla
Because of the above mentioned issues with distcp when running it over a cluster whose data is being modified while distcp is running, developers at Mozilla came up with their own Backup tool. They’ve described the tool and its use in the popular Migrating HBase in the Trenches post.
Cluster Replication
HBase has a relatively new and not yet widely used whole cluster replication mechanism. The backup cluster does not have to be identical to the master cluster, which means that the backup cluster could be much less powerful and thus cheaper, while still having enough storage to serve as backup.
Table Snapshot
Ah, the infamous HBASE-50! This issue saw some great work during GSoC 2010, but it looks like it was never integrated into HBase. It is unclear whether the contributor simply ran out of steam or time or whether it became apparent that table snapshots are too difficult to implement or simply not doable because of highly distributed nature of HBase. The JIRA issue does contain patches you can look at, and the author has a now inactive hbase-snapshot repository up on Github.
HDFS Replication
You could also simply crank up the replication factor to the level that makes you feel safe and call that a backup. This may not guard against data corruption, but it does guard against certain partial hardware failures.
Since so many people seem to be asking about HBase backup options, I hope this serves as a good point-in-time snapshot, a summary of all HBase backup options that are currently on the table. With time, this will be added to the HBase Book.
Are there other HBase backup options we should have included?
What you use for HBase backup?
原文:
http://blog.sematext.com/2011/03/11/hbase-backup-options/
另有一篇博文也可以参考一下:
http://blog.csdn.net/jingling_zy/article/details/7554676
分享到:
相关推荐
在IT行业中,尤其是在大数据处理领域,HBase是一个广泛使用的分布式、高性能、列式存储的NoSQL数据库。HBase是建立在Hadoop文件系统(HDFS)之上,为处理大规模数据提供了一个高效的数据存储解决方案。而Spring Data...
搭建pinpoint需要的hbase初始化脚本hbase-create.hbase
14.7. HBase Backup 14.8. Capacity Planning 15. 创建和开发 HBase 15.1. HBase 仓库 15.2. IDEs 15.3. 创建 HBase 12-5-30 HBase 官方文档 3/81 abloz.com/hbase/book.htm 15.4. Publishing a new version of ...
Options: deploy.sh build single 构建并启动一个hbase单实例 deploy.sh start single 启动hbase实例 deploy.sh stop single 停止hbase实例 deploy.sh check single 检测hbase实例状态 deploy.sh connect ...
HBase是一种分布式、基于列族的NoSQL数据库,由Apache软件基金会开发并维护,是Hadoop生态系统中的重要组件。这份“HBase官方文档中文版”提供了全面深入的HBase知识,帮助用户理解和掌握如何在大数据场景下有效地...
### HBase权威指南知识点概述 #### 一、引言与背景 - **大数据时代的来临**:随着互联网技术的发展,人类社会产生了前所未为的数据量。这些数据不仅数量巨大,而且种类繁多,传统的数据库系统难以应对这样的挑战。 ...
### HBase 配置内置 ZooKeeper 的详细步骤与解析 #### 一、配置背景与目的 在 HBase 的部署环境中,ZooKeeper 起着非常重要的作用,它主要用于协调集群中的各个节点,并且管理 HBase 的元数据。通常情况下,HBase ...
HBase是一种分布式、基于列族的NoSQL数据库,它在大数据领域中扮演着重要的角色,尤其是在需要实时查询大规模数据集时。HBase以其高吞吐量、低延迟和水平扩展能力而闻名,常用于存储非结构化和半结构化数据。在HBase...
### HBase开启审计日志详解 #### 一、概述 HBase是一款分布式列式存储系统,基于Google的Bigtable论文实现。它具有高可靠性、高性能、面向列、可伸缩的特点,非常适合处理海量数据。在大数据领域,HBase被广泛用于...
HBase(hbase-2.4.9-bin.tar.gz)是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System...
"基于SpringBoot集成HBase过程解析" SpringBoot集成HBase是当前大数据处理和存储解决方案中的一种常见组合。HBase是基于Hadoop的分布式、可扩展的NoSQL数据库,能够存储大量的结构化和非结构化数据。SpringBoot则...
### HBase学习利器:HBase实战 #### 一、HBase简介与背景 HBase是Apache Hadoop生态系统中的一个分布式、可扩展的列族数据库,它提供了类似Bigtable的能力,能够在大规模数据集上进行随机读写操作。HBase是基于...
在Windows上安装HBase 本文将指导您如何在Windows平台上安装HBase,包括配置详解。安装完成后,您将能够配置集群。 一、前提条件 在安装HBase前,需要安装Cygwin和Hadoop。这两个软件的安装不在本文的讨论范围内...
### HBase 安装与使用知识点详解 #### 概述 HBase 是一款构建于 Hadoop 之上的分布式、可扩展的大规模数据存储系统。它提供了类似 Google BigTable 的功能特性,非常适合处理海量数据和高并发读写需求的应用场景。...
HBase是Apache Hadoop生态系统中的一个分布式、版本化、列族式存储系统,设计用于处理大规模数据集。这个“hbase-2.4.17-bin”安装包提供了HBase的最新稳定版本2.4.17,适用于大数据处理和分析场景。下面将详细介绍...
HBase,全称为Hadoop Distributed File System上的基础结构(HBase on Hadoop Distributed File System),是一种分布式的、面向列的开源数据库,它构建在Apache Hadoop文件系统(HDFS)之上,提供高可靠性、高性能...
《HBase资源合集》包含了四本重量级的书籍,分别是《HBase企业应用开发实战》、《HBase权威指南》、《HBase实战》以及《HBase应用架构》。这些书籍深入浅出地探讨了HBase在大数据环境中的应用与开发,是学习和掌握...
在本文中,我们将深入探讨HBase的安装过程及其在CDH环境中的集成。HBase是Apache Hadoop生态系统中的一个核心组件,它是一个分布式、版本化的、支持列族的NoSQL数据库,特别适合处理大规模的数据存储。CDH(Cloudera...
class org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, file:hdfs://nameservice1/hbase/data/default/RASTER/92ceb2d86662ad6d959f4cc384229e0f/i, class org.apache.hadoop.hbase.backup....