MapReduce一个Mapper读Hbase表并且写Hbase方案

小网客

浏览: 1244278 次
性别:
来自: 北京

最近访客更多访客>>

aoyouzi

jis117

emaiqi

duguyixiaono1

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

DataMining
MapReduce

场景：

在MapReduce操作Hbase的时候有时候不需要reduce，只需要用Mapper的时候读A表，处理完之后写B表，也就是A表为输入表，B表为输出表。需求就是 A中的有一列E:E,把‘E:E’中为数字的数据保存到B表中。

表说明：

create A,'E'

create B,'E'

方案一:直接在mapper中打开B表，如果不是数字，那么直接Put进去，这个比较简单优缺点就不说了

方案二:TableMapReduceUtil.initTableMapperJob把A表作为输入，同时设置outputValueClass为Put，然后设置OutputTable即可，不需要reduce

实现：

Job：

private static void runJob() {
	String outputTableName = "B";
	String inputTableName = "A";
	Configuration conf = HBaseConfiguration.create();
	conf.set("hbase.master", XXX);
	conf.set("hbase.zookeeper.quorum", XXX);
	conf.set("hbase.cluster.distributed", "true");
	conf.set(TableOutputFormat.OUTPUT_TABLE, outputTableName);
	try {
		Scan scan = new Scan();
		Job job = new Job(conf, "DataFormat Task");

		job.setJarByClass(DataFormatTask.class);
		TableMapReduceUtil.initTableMapperJob(inputTableName, scan,
				DataFormatMapper.class, NullWritable.class, Put.class, job);

		job.setOutputFormatClass(TableOutputFormat.class);
		job.setNumReduceTasks(0);
		job.waitForCompletion(true);

	} catch (Throwable e) {
		throw new RuntimeException("Run DataFormatTask error! ", e);
	} finally {
		HConnectionManager.deleteConnection(conf, true);
	}

}

Main:

public static void main(String[] args) {
	runJob();
}

DataFormatMapper:

protected void map(ImmutableBytesWritable key, Result value,
	Context context) throws IOException,
	InterruptedException {
	LOGGER.info("key:" + Bytes.toString(key.get()));
	LOGGER.info("row:" + Bytes.toString(value.getRow()));
	String val = Bytes.toString(value.getValue(Bytes.toBytes("E"), Bytes.toBytes("E")));
	if (!NumberUtils.isDigits(val)) {
		return;
	}
	Put put = new Put(key.get());
	put.add(Bytes.toBytes("E"), Bytes.toBytes("E"), Bytes.toBytes(val));
	context.write(NullWritable.get(), put);
}

0
顶

0
踩

分享到：

利用MapReduce的方式往固定表mock测试数据 | Hbase工具一点通之一

2013-03-07 14:48
浏览 2337
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

MapReduce一个Mapper读Hbase表并且写Hbase方案

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

MapReduce一个Mapper读Hbase表并且写Hbase方案

评论

发表评论

相关推荐

cqlsh执行报错"No appropriate python interpreter found."

R之折线图

数据归一化

Hadoop的MR中获取JobTracker配置

pearson相关系数计算

Mahout之Describe应用使用

Mahout分类之决策树PartialBuilder应用使用

数据挖掘之分类指标：召回率 、精确度、准确率、虚警率和漏警率

Hadoop之YARN安装部署

weka之Mysql数据装载

weka之数据预处理

数据挖掘之CRISP-DM 模型

pearson相关系数

Yarn下的YarnChild启动个数决定参数

HDFS超租约异常（org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException）

hive执行hql脚本

NNBench执行过程和参数说明

yarn下的hdfs和mr性能调优参数一览表

hive安装部署之单用户模式

Hadoop之SafeModeException问题解决

最近访客更多访客>>

数据挖掘之分类指标：召回率、精确度、准确率、虚警率和漏警率