HBase Backup Options

chenchao051

浏览: 138773 次
性别:
来自: 杭州

最近访客更多访客>>

zhutiehan

zhufeizzz

pf8123829456

zjy_369

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hadoop
hbase

hbase mapreduce hadoop

If you are thinking about using HBase you will likely want to understand HBase backup options. I know we did, so let us share what we found. Please let us know what we missed and what you use for HBase backup!

Export

You could export your tables using the Export (org.apache.hadoop.hbase.mapreduce.Export) MapReduce job that will export the table data into a Sequence File on HDFS. This was implemented in HBASE-1684 if you want to check out the patch or comments there. This tool works on one table at a time, so if you need to backup multiple tables, run this on each table. The exported data can then be imported back into HBase by the Import tool.

Copy Table

If you have another HBase cluster that you want to treat as a backup cluster, you can use the handy CopyTable tool to copy a table at a time.

Distcp

You could use Hadoop’s distcp command to copy the whole /hbase directory from one HDFS cluster to the other. However, this can leave your data in an inconsistent state, so it should be avoided. See http://search-hadoop.com/m/wkMgSjVLDb

At this point we should point out that all of the above backup methods are per-table. Moreover, they don’t work or create a snapshot of the table. Export and CopyTable are atomic only at the row level. Furthermore, if you have multiple tables whose tables depend on each other, if they are being modified while you are exporting or copying them, you will end up with inconsistent data – the data in those tables will not be in sync. See http://search-hadoop.com/m/Q4bU81G116p.

Backup from Mozilla

Because of the above mentioned issues with distcp when running it over a cluster whose data is being modified while distcp is running, developers at Mozilla came up with their own Backup tool. They’ve described the tool and its use in the popular Migrating HBase in the Trenches post.

Cluster Replication

HBase has a relatively new and not yet widely used whole cluster replication mechanism. The backup cluster does not have to be identical to the master cluster, which means that the backup cluster could be much less powerful and thus cheaper, while still having enough storage to serve as backup.

Table Snapshot

Ah, the infamous HBASE-50! This issue saw some great work during GSoC 2010, but it looks like it was never integrated into HBase. It is unclear whether the contributor simply ran out of steam or time or whether it became apparent that table snapshots are too difficult to implement or simply not doable because of highly distributed nature of HBase. The JIRA issue does contain patches you can look at, and the author has a now inactive hbase-snapshot repository up on Github.

HDFS Replication

You could also simply crank up the replication factor to the level that makes you feel safe and call that a backup. This may not guard against data corruption, but it does guard against certain partial hardware failures.

Since so many people seem to be asking about HBase backup options, I hope this serves as a good point-in-time snapshot, a summary of all HBase backup options that are currently on the table. With time, this will be added to the HBase Book.

Are there other HBase backup options we should have included?

What you use for HBase backup?

原文：http://blog.sematext.com/2011/03/11/hbase-backup-options/
另有一篇博文也可以参考一下:http://blog.csdn.net/jingling_zy/article/details/7554676

分享到：

hadoop 0.20.203 数据迁移至 cdh3u3 | shell中的判断条件及部分实用命令

2012-08-23 15:24
浏览 1335
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论