mapreduce中入门中要注意的几点

jackyrong

浏览: 7978416 次
性别:
来自: 广州

最近访客更多访客>>

u013375349

尘与飞

深情痞子

Crow00

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

JAVA相关

在 mapreduce中，比如有如下的词：

I love beijing
i love cina
beijing is the captial of china
统计的时候，如下图：

注意上图中，最左边的偏移量第一个为1，然后I LOVE CHINA中，I是第4个单词了；所以偏移量为4；
然后进行分词，
然后K3就是把每个单词归类的KEY，然后V3（1，1），就是说I这个单词，统计的频率；

相关的mapper的编写：

public class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable> {

	@Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
		/*
		 * key: 输入的key
		 * value: 数据   I love Beijing
		 * context: Map上下文
		 */
		String data= value.toString();
		//分词
		String[] words = data.split(" ");
		
		//输出每个单词
		for(String w:words){
			context.write(new Text(w), new LongWritable(1));
		}
	}

}

reduce:


public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable>{

	@Override
	protected void reduce(Text k3, Iterable<LongWritable> v3,Context context) throws IOException, InterruptedException {
		//v3: 是一个集合，每个元素就是v2
		long total = 0;
		for(LongWritable l:v3){
			total = total + l.get();
		}
		
		//输出
		context.write(k3, new LongWritable(total));
	}

}

主程序：

public class WordCountMain {

	public static void main(String[] args) throws Exception {
		//创建一个job = map + reduce
		Configuration conf = new Configuration();
		
		//创建一个Job
		Job job = Job.getInstance(conf);
		//指定任务的入口
		job.setJarByClass(WordCountMain.class);
		
		//指定job的mapper
		job.setMapperClass(WordCountMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(LongWritable.class);
		
		//指定job的reducer
		job.setReducerClass(WordCountReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(LongWritable.class);
		
		//指定任务的输入和输出
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));		
		
		//提交任务
		job.waitForCompletion(true);
	}

分享到：

python 的requests小结 | HDFS的基本操作

2018-05-06 08:59
浏览 694
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

mapreduce中入门中要注意的几点

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

mapreduce中入门中要注意的几点

评论

发表评论

相关推荐

复习：强迫线程顺序执行方式

(转）不错的前后端处理异常的方法

info q的极客时间大咖说等资料下载

CXF 客户端超时时间设置（非Spring配置方式）

(转)synchronized关键字画像：正确打开方式

CountDownLatch的例子

两道面试题，带你解析Java类加载机制

Spring中获取request的几种方法，及其线程安全性分析

内部类小结

JVM虚拟机小结1

windows下自带命令行工具查看CPU资源情况等

(收藏）深入分析Java的序列化与反序列化

apache common包中的序列化工具

JAVA8 JVM的变化： 元空间（Metaspace）

(转）服务器性能指标（一）——负载（Load）分析及问题排查

（转）对象复用

HDFS的基本操作

一个不错的开源工具类，专门用来解析日志头部的，好用

介绍个不错的RESTFUL MOCK的工具wiremock

LINUX下EPOLL等不错的文章收藏

最近访客更多访客>>

JAVA8 JVM的变化：元空间（Metaspace）