serial no | solutions |
level |
preconditino |
runon |
flow |
advances | shortcomings | use cases |
1 | direct client API | log | - |
transfer data via both clusters |
||||
2 | export/import | log |
|
-src then target |
-mr gen hdfs seq files -transfer files -import with mr |
-support time-range filter | ||
3 | copy table | stream |
src |
case: 1.copy directly data(mem+hfile) to other(IFF cluster to cluster is enable)
2.(IFF cluster to cluster is NOT enable) same as export ,but the last step is:using hdfs put files |
||||
4 | replication | wal | sync wal with new cluster | |||||
5 | bulkload | |||||||
6 | snapshot | file |
-flush before snapshoting if online |
-src then target |
-create snapshot -clone to new table -restore from new table[cluster internal] |
|||
7 | distcp | file |
-src |
-flush memstore -distcp files within both clusters |
-cant copy data with specified date-range; but it can be used as the final step to transfer the target files generated by other solutions -stop hbase before distcp |
|||
now,i want to retrieve last month datum from a table to backup to another cluster,but both clusters cant connected to each other(no MR),so i issued the new steps:
1.subset the table data (last month:2014-06-01--> 2014-06-30)
hbase org.apache.hadoop.hbase.mapreduce.CopyTable -Dhbase.client.scanner.caching=1000 -Dmapred.map.tasks.speculative.execution=false --starttime=1401552000000 --endtime=1404057600000 --new.name=new-tableX tableX
then you MUST flush this table as some data lie on memstores,and the next step will operate on file level directly,
echo "flush 'new-tableX' "|hbase shell
2.download hdfs table hfiles
hadoop fs -get /hbase/new-tableX new-tableX
(of curse u can run extend this command in multi nodes parallelly by subtasking the dirs)
3.transfer these files to other cluster parallelly
a.scp part files to local nodeA,B,C...
b.run scp part-files to peer node of another cluster
(so these will balance the network bandwidth limited by one node for both sides)
4.now import the data to hdfs
hadoop fs -put part-files /hbase
(just mkdir it if nonexists)
5.load these hfiles to meta and assign
hbase hbck -fixMeta
then
hbase hbck -fixAssignments
(try second step one more time to the jude whether table is readable or not)
6.rename the new table to original table[optional]
hbase shell> disable 'tableName' hbase shell> snapshot 'tableName', 'tableSnapshot' hbase shell> clone_snapshot 'tableSnapshot', 'newTableName' hbase shell> delete_snapshot 'tableSnapshot' hbase shell> drop 'tableName'
utility snapshot is supported by 0.94.6+ version,and u can patch your old version also if u have a older one.
some optimized usages in step 1
-mapreduce failure times
-D=mapred.map.max.attempts=2
failure ratio
-D=mapred.max.map.failures.percent=0.05
-close the hlog writing(maybe refactor the Import.Imperter.java)
-decrease the block replication
-D-Ddfs.replication=2 or -D-Ddfs.replication=1
-increase the buffer
-Dhbase.client.write.buffer=10485760
-presplit the new table when created in step 1
{NUMREGIONS => [1], SPLITALGO => 'HexStringSplit'}
[1] hbase -how many regions are fit for a table when prespiting or keeping running
ref:
CDH:introduction-to-apache-hbase-snapshots
jira:snapshot of table (attached principle docs)
复制部分HBase表用于测试 (some tools used java class in shell)
相关推荐
hbase-hbck2-1.1.0-SNAPSHOT.jar
HBCK是HBase1.x中的命令,到了HBase2.x中,HBCK命令不适用,且它的写功能(-fix)已删除;...其GitHub地址为:https://github.com/apache/hbase-operator-tools.git 附件资源是已经编译好的hbase2.4.4版本的hbck
赠送jar包:phoenix-core-4.7.0-HBase-1.1.jar; 赠送原API文档:phoenix-core-4.7.0-HBase-1.1-javadoc.jar; 赠送源代码:phoenix-core-4.7.0-HBase-1.1-sources.jar; 赠送Maven依赖信息文件:phoenix-core-4.7.0...
HBase 元数据修复工具包。 ①修改 jar 包中的application.properties,重点是 zookeeper.address、zookeeper.nodeParent、hdfs....③开始修复 `java -jar -Drepair.tableName=表名 hbase-meta-repair-hbase-2.0.2.jar`
hbase官网下载地址(官网下载太慢): https://downloads.apache.org/hbase/ 国内镜像hbase-2.4.16: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.4.16/hbase-2.4.16-bin.tar.gz
标题中的“hbase-2.4.11-bin.tar.gz”是指HBase的2.4.11稳定版本的二进制压缩包,用户可以通过下载这个文件来进行安装和部署。 HBase的核心设计理念是将数据按照行和列进行组织,这种模式使得数据查询和操作更加...
搭建pinpoint需要的hbase初始化脚本hbase-create.hbase
进入 `conf` 目录,复制 `hbase-site.xml.example` 文件为 `hbase-site.xml`,并编辑该文件,添加如下配置: ```xml <name>hbase.rootdir</name> <value>hdfs://namenode_host:port/hbase</value> </property>...
hbase-sdk是基于hbase-client和hbase-thrift的原生API封装的一款轻量级的HBase ORM框架。 针对HBase各版本API(1.x~2.x)间的差异,在其上剥离出了一层统一的抽象。并提供了以类SQL的方式来读写HBase表中的数据。对...
https://github.com/apache/hbase-connectors/tree/master/spark mvn -Dspark.version=2.4.4 -Dscala.version=2.11.7 -Dscala.binary.version=2.11 clean install
Apache Phoenix是构建在HBase之上的关系型数据库层,作为内嵌的客户端JDBC驱动用以对HBase中的数据进行低延迟访问。Apache Phoenix会将用户编写的sql查询编译为一系列的scan操作,最终产生通用的JDBC结果集返回给...
hbase-client-2.1.0-cdh6.3.0.jar
HBase(hbase-2.4.9-bin.tar.gz)是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System...
3. 移动到期望的安装目录,例如 `/usr/local/`:`sudo mv hbase-2.3.3 /usr/local/hbase` 4. 添加HBase到环境变量:在`~/.bashrc`文件中添加`export HBASE_HOME=/usr/local/hbase`和`export PATH=$PATH:$HBASE_HOME/...
参考:https://blog.csdn.net/yangbutao/article/details/12911487
hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz
《Phoenix与HBase的深度解析:基于phoenix-hbase-2.4-5.1.2版本》 在大数据处理领域,Apache HBase和Phoenix是两个至关重要的组件。HBase作为一个分布式、列式存储的NoSQL数据库,为海量数据提供了高效、实时的访问...
标题“hbase-1.2.1-bin.tar.gz.zip”表明这是HBase 1.2.1版本的二进制发行版,以tar.gz格式压缩,并且进一步用zip压缩。这种双重压缩方式可能用于减小文件大小,方便在网络上传输。用户需要先对zip文件进行解压,...
这个“hbase-2.4.17-bin”安装包提供了HBase的最新稳定版本2.4.17,适用于大数据处理和分析场景。下面将详细介绍HBase的核心概念、安装步骤以及配置和管理。 一、HBase核心概念 1. 表(Table):HBase中的表是由行...
在这个场景中,我们关注的是"Hbase-1.4.10-bin.tar.gz",这是HBase 1.4.10版本的二进制发行包,适用于Linux操作系统。 首先,安装HBase前,确保你的Linux系统已经安装了Java开发环境(JDK),因为HBase依赖于Java...