Hbase MapReduce例子 -

genius_bai

浏览: 81732 次
性别:
来自: 上海

最近访客更多访客>>

cielleech

TiFa.L.Hart

wgying

star12396

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Hbase MapReduce例子

博客分类：

分布式数据存储

HBase Mapreduce Hadoop jruby lucene

Hbase Mapreduce 例子

http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description

http://wiki.apache.org/hadoop/Hbase/MapReduce （Deprecated）

需要重启Hadoop的方式

所有机器都有修改配置

1:修改$HADOOP_HOME/conf/hadoop-env.sh ，添加HBase类库引用

export HBASE_HOME=/home/iic/hbase-0.20.3

export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.20.3.jar:$HBASE_HOME/hbase-0.20.3-test.jar:$HBASE_HOME/conf:${HBASE_HOME}/lib/zookeeper-3.3.0.jar

不需要重启Hadoop的方式（把依赖类库打包进jar/lib目录下，同时代码中调用job.setJarByClass(XXX.class);）

Another possibility, if for example you do not have access to hadoop-env.sh or are unable to restart the hadoop cluster, is bundling the hbase jars into a mapreduce job jar adding it and its dependencies under the job jar lib/ directory and the hbase conf into the job jars top-level directory.

测试，出现异常：java.lang.OutOfMemoryError: Java heap space

bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 4

HBase map reduce 2

此例子，把表mrtest中的列contents的值，反转后，保存到列text里。

bin/hbase shell

create 'mrtest', 'contents', 'text'

put 'mrtest', '1', 'contents:', 'content'

put 'mrtest', '1', 'text:', 'text'

get 'mrtest', '1'

类com.test.hadoop.hbase.HBaseTest 生成100W的测试数据。

/home/iic/hadoop-0.20.2/bin/hadoop jar examples/examples_1.jar examples.TestTableMapReduce

HBase 自带例子

hbase-0.20.3\src\test

计算表的总行数（org.apache.hadoop.hbase.mapreduce.RowCounter）

bin/hadoop jar /home/iic/hbase-0.20.3/hbase-0.20.3.jar rowcounter scores grade

结果

10/04/12 17:08:05 INFO mapred.JobClient: ROWS=2

对HBase的列进行Lucene索引(examples.TestTableIndex)

对表mrtest的列contents进行索引，使用lucene-core-2.2.0.jar，需把它加入类路径。把lucene-core-2.2.0.jar加入到examples.zip/lib目录下，同时代码中必须指定job.setJarByClass(TestTableIndex.class);不然lucene不识别

bin/hadoop fs -rmr testindex

bin/hadoop jar examples.zip examples.TestTableIndex

先从文件中产生适合HBase的HFiles文件，再倒入到Hbase中，加快导入速度

examples.TestHFileOutputFormat

输入的数据，由例子自动生成,其中Key是前面补0的十位数“0000000001”。

输出数据目录：/user/iic/hbase-hfile-test

bin/hadoop fs -rmr hbase-hfile-test

bin/hadoop jar examples.zip examples.TestHFileOutputFormat

加载生成的数据到Hbae中（要先安装JRuby，才能执行）

export PATH=$PATH:/home/iic/jruby-1.4.0/bin/

echo $PATH

vi bin/loadtable.rb

require '/home/iic/hbase-0.20.3/hbase-0.20.3.jar'
require '/home/iic/hadoop-0.20.2/hadoop-0.20.2-core.jar'
require '/home/iic/hadoop-0.20.2/lib/log4j-1.2.15.jar'
require '/home/iic/hadoop-0.20.2/lib/commons-logging-1.0.4.jar'
require '/home/iic/hbase-0.20.3/lib/zookeeper-3.3.0.jar'
require '/home/iic/hbase-0.20.3/lib/commons-cli-2.0-SNAPSHOT.jar'

$CLASSPATH <<'/home/iic/hbase-0.20.3/conf';

delete table "hbase-test"

jruby bin/loadtable.rb hbase-test /user/iic/hbase-hfile-test

查看其使用方式

（bin/hbase org.jruby.Main bin/loadtable.rb

Usage: loadtable.rb TABLENAME HFILEOUTPUTFORMAT_OUTPUT_DIR

其使用JRuby

）

注意：此种方式，必须解决几个问题

1：your MapReduce job ensures a total ordering among all keys ，by default distributes keys among reducers using a Partitioner that hashes on the map task output key。(key.hashCode() & Integer.MAX_VALUE) % numReduceTasks

默认MR在使用默认的default hash Partitioner 分配Key给Reducer的时候，如果Key是0~4，有2个Task，则

reducer 0 would have get keys 0, 2 and 4 whereas reducer 1 would get keys 1 and 3 (in order).

则生成的Block里面的Start key 和 End Key次序讲混乱，

System.out.println((new ImmutableBytesWritable("0".getBytes())
.hashCode() & Integer.MAX_VALUE)

所以需要实现自己的Hash Partitioner ，生成the keys need to be orderd so reducer 0 gets keys 0-2 and reducer 1 gets keys 3-4 (See TotalOrderPartitioner up in hadoop for more on what this means).

验证导入的行数

bin/hadoop jar /home/iic/hbase-0.20.3/hbase-0.20.3.jar rowcounter hbase-test info

HFile生成例子2：

此种例子，只适合第一次海量导入数据，因为bin/loadtable.rb每次都替换所有的文件。

对于后续的数据操作，可以使用Map文本文件+Hbase Table直接操作Insert的功能。

或者保证新增加的Key跟原来没有冲突，按照bin/loadtable.rb的逻辑，添加新的Block。

生成1KW数据的test_1kw.log：0,content0,1271222976817

/home/bmb/jdk1.6.0_16/bin/java -cp examples.zip examples.CreateLogFile test_1kw.log 10000000

bin/hadoop fs -put test_1kw.log hadoop-performance-test

只用1个Reduce Task，避免Total Order Key的问题

bin/hadoop jar examples.zip examples.TestCreateHFileMR hadoop-performance-test hadoop-hbase-hfile-test 1

生成Hbase Hfile文件才花了一点时间，比性能测试生成1KW的HBase数据快了N多。

10/04/15 14:22:59--10/04/15 14:25:22

导入Hbase

jruby bin/loadtable.rb hbase-test2 hadoop-hbase-hfile-test

验证导入的行数

bin/hadoop jar /home/iic/hbase-0.20.3/hbase-0.20.3.jar rowcounter hbase-test2 info

其他HBase MapReduce例子
http://www.hadoop.org.cn/mapreduce/hbase-mapreduce/

http://www.spicylogic.com/allenday/blog/2008/08/28/hbase-bulk-load-import-example/

分享到：

[Hadoop] ZooKeeper | Hadoop + HBase 集群

2010-04-12 15:40
浏览 12443
评论(1)
分类:数据库
查看更多

1 楼 egoegmdslls 2012-07-18

你好，请问文中的examples是哪里的？怎么下载，谢谢！

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hbase MapReduce例子

Hbase Mapreduce 例子

HBase map reduce 2

HBase 自带例子

计算表的总行数（org.apache.hadoop.hbase.mapreduce.RowCounter）

对HBase的列进行Lucene索引(examples.TestTableIndex)

先从文件中产生适合HBase的HFiles文件，再倒入到Hbase中，加快导入速度

HFile生成例子2：

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hbase MapReduce例子

Hbase Mapreduce 例子

HBase map reduce 2

HBase 自带例子

计算表的总行数（org.apache.hadoop.hbase.mapreduce.RowCounter）

对HBase的列进行Lucene索引(examples.TestTableIndex)

先从文件中产生适合HBase的HFiles文件，再倒入到Hbase中，加快导入速度

HFile生成例子2：

评论

发表评论

相关推荐

[Hadoop] Hive 性能+特性

[Hadoop] Hive HQL

[Hadoop] Hive

Hadoop 资源+配置+性能

[Hadoop] ZooKeeper

Hadoop + HBase 集群

Hadoop MapReduce例子

Hadoop

Hadoop 异常处理

Cassandra

最近访客更多访客>>