Hadoop的WordCount编写

username2

浏览: 752012 次
性别:
来自: 黑龙江

最近访客更多访客>>

dsh_oliver

杭州007

loginboot

xmmdream

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hadoop学习笔记

1 hadoop的wordCount就像学编程时候的helloWord 一样，是编写程序的一个开始。

程序可以根据注释加以理解：

/**
* @ClassName: WordCount2 
* @Description: 执行：1 打成jar包  2 上传到hadoop服务器中  3 利用hadoop命令执行（输入和输出参数用文件夹）
* 	如： bin/hadoop jar  WordCount1.jar   /user/root/word_count_in/ /user/root/word_count_out 
* @author:root
* @date 2014年11月24日 下午4:52:30 
*
 */
public class WordCount2 {

	/** 
        
	 *   Mapper<Object, Text, Text, IntWritable> 四个参数分别为 maper的输入key类型，输入value的类型，输出key的类型，输出value的类型
         *   mapper输出的key，value类型就是reducer输入的key，value类型，默认mapper输入key为行号，value为一行的字符串
         *   reducer将mapper输出按照key分组，相同的key数据列表放到同一个列表中，一起传递给reducer处理。
	 * 1 参数类型根据TextInputFormat类来定的，输入格式化还有其他类：如  http://username2.iteye.com/blog/2159836
	 *  map[这里读入输入文件内容 以" \t\n\r\f" 进行分割
	 */
	public static class TokenizerMapper extends
			Mapper<Object, Text, Text, IntWritable> {

		private final static IntWritable one = new IntWritable(1);
		private Text word = new Text();

		public void map(Object key, Text value, Context context)
				throws IOException, InterruptedException {
			StringTokenizer itr = new StringTokenizer(value.toString());
			while (itr.hasMoreTokens()) {
				word.set(itr.nextToken());
				context.write(word, one);
			}
		}
	}

	/**
	 * IntSumReducer 继承自 Reducer<Text,IntWritable,Text,IntWritable>
	 * [不管几个Map,都只有一个Reduce,这是一个汇总]  
	 * 这里的key为Mapper设置的word[每一个key/value都会有一次reduce]
	 * 当循环结束后，最后的确context就是最后的结果.
	 */
	public static class IntSumReducer extends
			Reducer<Text, IntWritable, Text, IntWritable> {
		private IntWritable result = new IntWritable();

		public void reduce(Text key, Iterable<IntWritable> values,
				Context context) throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable val : values) {
				sum += val.get();
			}
		 
			result.set(sum);
			context.write(key, result);
		}
	}
	/**
	* @Title: main 
	* 执行命令： 
	*   命令   jar包   输入    输出 	
	*   bin/hadoop jar  WordCount1.jar   /user/root/word_count_in/ /user/root/word_count_out
	* @return void    返回类型 
	* @throws 
	 */
	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf, args)
				.getRemainingArgs();
		/**
		 * 这里必须有输入/输出
		 */
		if (otherArgs.length != 2) {
			System.err.println("Usage: wordcount <in> <out>");
			System.exit(2);
		}
		//构建mapreduce任务也叫mapreduce作业也叫做一个mapreduce的job
		Job job = new Job(conf, "word count");
		job.setJarByClass(WordCount2.class);// 主类
		job.setInputFormatClass( TextInputFormat.class);//文件输入的处理格式 
		job.setOutputFormatClass(TextOutputFormat.class);//文件输出的处理格式 
		
		job.setMapperClass(TokenizerMapper.class);// mapper
		job.setCombinerClass(IntSumReducer.class);// 作业合成类，可以提高运行效率
		job.setReducerClass(IntSumReducer.class);// reducer
		
		job.setOutputKeyClass(Text.class);// 设置作业输出数据的关键类
		job.setOutputValueClass(IntWritable.class);// 设置作业输出值类
		
		FileInputFormat.addInputPath(job, new Path(otherArgs[0]));// 输入文件路径
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));// 输出文件路径
		
		System.exit(job.waitForCompletion(true) ? 0 : 1);// 等待完成退出.
	}
}

分享到：

MySQL（RPM格式）在Linux中安装 | Hadoop中MapReduce的一些关键词理解

2014-11-25 09:22
浏览 866
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hadoop的WordCount编写

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hadoop的WordCount编写

评论

发表评论

相关推荐

strom使用示例

Hadoop2.x动态添加或删除datanode

MapReduce2中自定义排序分组

MapReduce中自定义Combiner

2.x MapReduce的测试类

Sqoop

kafka使用与安装

storm 的安装使用

Hbase 的Java API 操作

Hbase 的java API 操作

Hbase集群安装

HIVE的安装与使用

HA 下执行JAVA操作hdfs

hadoop 2.x集群安装与配置

zookeeper安装

hadoop 2.x wordcount练习

Hadoop 2.x单节点部署学习。

SequenceFile和MapFile使用

重新编译Hadoop

Hadoop 中数据的序列化与反序列化

最近访客更多访客>>