MapReduce牛逼（1）MR单词计数例子

EclipseEye

浏览: 151624 次
性别:
来自: 北京

最近访客更多访客>>

chenqisdfx

xiaohuohaoxiao

The魂狩

小小云麓

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hadoop/MapReaduce


package cmd;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class ConfiguredTest extends Configured implements Tool {

	@Override
	public int run(String[] args) throws Exception {
		String INPUT_PAHT = args[0];
		String OUTPUT_PAHT = args[1];

		Job job = new Job(new Configuration(), ConfiguredTest.class.getName());
		job.setJarByClass(ConfiguredTest.class);
		// 1.1 输入
		FileInputFormat.setInputPaths(job, new Path(INPUT_PAHT));
		job.setInputFormatClass(TextInputFormat.class);
		// 1.2 Mapper
		job.setMapperClass(MyMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(LongWritable.class);
		// 1.3 分区
		job.setPartitionerClass(HashPartitioner.class);
		job.setNumReduceTasks(1);
		// 1.4 排序、分组
		job.setGroupingComparatorClass(cls)
		job.setSortComparatorClass(cls);
		// 1.5 规约合并
		job.setCombinerClass(MyReducer.class);

		// 2.1 suffered 多个mapper 通过网络，传输到各自分区的reducer上
		// 2.2 reducer
		job.setReducerClass(MyReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(LongWritable.class);
		// 2.3 输出
		job.setOutputFormatClass(TextOutputFormat.class);
		FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PAHT));

		job.waitForCompletion(true);
		return 0;
	}

	public static void main(String[] args) throws Exception {
		ConfiguredTest configuredTest = new ConfiguredTest();
		ToolRunner.run(configuredTest.getConf(), configuredTest, args);
	}

	static class MyMapper extends
			Mapper<LongWritable, Text, Text, LongWritable> {
		protected void map(
				LongWritable key,
				Text value,
				org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, LongWritable>.Context context)
				throws java.io.IOException, InterruptedException {

			String[] split = value.toString().split("\t");
			for (String str : split) {
				context.write(new Text(str), new LongWritable(1));
			}
		};
	}

	static class MyReducer extends
			Reducer<Text, LongWritable, Text, LongWritable> {
		protected void reduce(
				Text key,
				java.lang.Iterable<LongWritable> it,
				org.apache.hadoop.mapreduce.Reducer<Text, LongWritable, Text, LongWritable>.Context context)
				throws java.io.IOException, InterruptedException {

			long num = 0;
			for (LongWritable longWritable : it) {
				num += longWritable.get();
			}
			context.write(key, new LongWritable(num));

		};
	}

}

分享到：

MapReduce牛逼（2）MR简单实现导入数据 ... | InputFormat牛逼（9）FileInputFormat实 ...

2015-03-11 00:44
浏览 1224
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

MapReduce牛逼（1）MR单词计数例子

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

MapReduce牛逼（1）MR单词计数例子

评论

发表评论

相关推荐

数据迁移相关（关系型数据库mysql，oracle和nosql数据库如hbase）

zookeeper适用场景：如何竞选Master及代码实现

MR/hive 数据去重

面试牛x题

使用shell并发上传文件到hdfs

hadoop集群监控工具Apache Ambari

Hadoop MapReduce优化相关

数据倾斜问题 牛逼（1）数据倾斜之MapReduce&hive

MapReduce牛逼（4）WritableComparable接口

MapReduce牛逼（3）（继承WritableComparable)实现自定义key键，实现二重排序

MapReduce牛逼（2）MR简单实现 导入数据到hbase例子

InputFormat牛逼（9）FileInputFormat实现类之SequenceFileInputFormat

InputFormat牛逼（8）FileInputFormat实现类之TextInputFormat

InputFormat牛逼（6）org.apache.hadoop.mapreduce.lib.db.DBRecordReader<T>

InputFormat牛逼（5）org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>

InputFormat牛逼（4）org.apache.hadoop.mapreduce.RecordReader<KEYIN, VALUEIN>

InputFormat牛逼（3）org.apache.hadoop.mapreduce.InputFormat<K, V>

InputFormat牛逼（2）org.apache.hadoop.mapreduce.InputSplit & DBInputSplit

InputFormat牛逼（1）org.apache.hadoop.mapreduce.lib.db.DBWritable

如何把hadoop2 的job作业 提交到 yarn平台

最近访客更多访客>>

数据倾斜问题牛逼（1）数据倾斜之MapReduce&hive

MapReduce牛逼（2）MR简单实现导入数据到hbase例子

如何把hadoop2 的job作业提交到 yarn平台