hadoop Map/Reduce 初试

sunqi

浏览: 230856 次
性别:
来自: 杭州

最近访客更多访客>>

wangyy

jusescn

yao88611852

lcyncchn

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hadoop

安装好了hadoop集群环境，详细见（hadoop安装），当然得要跑一下Map/Reduce

package com.hadoop;

import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxValue {

	static class MaxValueMapper extends
			Mapper<LongWritable, Text, Text, IntWritable> {

		public void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {

			String line = value.toString();
			// 从每行的数据中分解要统计的key和value
			String thekey = line.substring(0, 4);
			int theValue = Integer.parseInt(line.substring(5, 8));
			context.write(new Text(thekey), new IntWritable(theValue));
		}
	}

	static class MaxValueReducer extends
			Reducer<Text, IntWritable, Text, IntWritable> {

		public void reduce(Text key, Iterable<IntWritable> values,
				Context context) throws IOException, InterruptedException {

			int maxValue = Integer.MIN_VALUE;
			// key求出最大的温度值
			for (IntWritable value : values) {
				maxValue = Math.max(maxValue, value.get());
			}
			context.write(key, new IntWritable(maxValue));
		}
	}

	public static void main(String[] args) throws Exception {
		if (args.length != 2) {
			System.err.println("Usage: MaxValue <input path> <output path>");
			System.exit(-1);
		}

		Job job = new Job();
		job.setJarByClass(MaxValue.class);

		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		job.setMapperClass(MaxValueMapper.class);
		job.setReducerClass(MaxValueReducer.class);

		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

代码很简单，打包成jar包，随便命名成first.jar

然后写个程序，随机生成一批数据上传到hadoop中，为程序简单处理，生成一批格式如

2000 111

2012 333

2012 444

2000 222

类似一大批数据命名为temp.txt

上传

hadoop dfs -put temp1.txt /user/hadoop/input/

列出HDFS下的文件

hadoop dfs -ls

Found 2 items

drwxr-xr-x - hadoop supergroup 0 2012-04-06 13:40 /user/hadoop/input

drwxr-xr-x - hadoop supergroup 0 2012-04-06 11:30 /user/hadoop/output

hadoop dfs -ls in 列出HDFS下某个文档中的文件

hadoop dfs -ls in /user/hadoop/input

Found 2 items

-rw-r--r-- 2 hadoop supergroup 2043 2012-02-29 18:18 /user/hadoop/input/slaves.sh

-rw-r--r-- 2 hadoop supergroup 10000 2012-04-06 13:39 /user/hadoop/input/temp1.txt

可以看到上传上去的temp1.txt文件

然后运行

hadoop jar first.jar com.hadoop.MaxValue /user/hadoop/input/temp1.txt output1

first.jar jar的名字

com.hadoop.MaxValue 类名

/user/hadoop/input/temp1.txt mian函数对应的输入

output1 mian函数对应的输出

然后可以看到

12/04/06 13:41:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

12/04/06 13:41:43 INFO input.FileInputFormat: Total input paths to process : 1

12/04/06 13:41:44 INFO mapred.JobClient: Running job: job_201203121856_0005

12/04/06 13:41:45 INFO mapred.JobClient: map 0% reduce 0%

12/04/06 13:41:58 INFO mapred.JobClient: map 100% reduce 0%

12/04/06 13:42:10 INFO mapred.JobClient: map 100% reduce 100%

12/04/06 13:42:15 INFO mapred.JobClient: Job complete: job_201203121856_0005

12/04/06 13:42:15 INFO mapred.JobClient: Counters: 25

12/04/06 13:42:15 INFO mapred.JobClient: Job Counters

12/04/06 13:42:15 INFO mapred.JobClient: Launched reduce tasks=1

12/04/06 13:42:15 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12195

12/04/06 13:42:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

12/04/06 13:42:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

12/04/06 13:42:15 INFO mapred.JobClient: Launched map tasks=1

12/04/06 13:42:15 INFO mapred.JobClient: Data-local map tasks=1

12/04/06 13:42:15 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10085

12/04/06 13:42:15 INFO mapred.JobClient: File Output Format Counters

12/04/06 13:42:15 INFO mapred.JobClient: Bytes Written=27

12/04/06 13:42:15 INFO mapred.JobClient: FileSystemCounters

12/04/06 13:42:15 INFO mapred.JobClient: FILE_BYTES_READ=11006

12/04/06 13:42:15 INFO mapred.JobClient: HDFS_BYTES_READ=10111

12/04/06 13:42:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=63937

12/04/06 13:42:15 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27

12/04/06 13:42:15 INFO mapred.JobClient: File Input Format Counters

12/04/06 13:42:15 INFO mapred.JobClient: Bytes Read=10000

12/04/06 13:42:15 INFO mapred.JobClient: Map-Reduce Framework

12/04/06 13:42:15 INFO mapred.JobClient: Reduce input groups=3

12/04/06 13:42:15 INFO mapred.JobClient: Map output materialized bytes=11006

12/04/06 13:42:15 INFO mapred.JobClient: Combine output records=0

12/04/06 13:42:15 INFO mapred.JobClient: Map input records=1000

12/04/06 13:42:15 INFO mapred.JobClient: Reduce shuffle bytes=11006

12/04/06 13:42:15 INFO mapred.JobClient: Reduce output records=3

12/04/06 13:42:15 INFO mapred.JobClient: Spilled Records=2000

12/04/06 13:42:15 INFO mapred.JobClient: Map output bytes=9000

12/04/06 13:42:15 INFO mapred.JobClient: Combine input records=0

12/04/06 13:42:15 INFO mapred.JobClient: Map output records=1000

12/04/06 13:42:15 INFO mapred.JobClient: SPLIT_RAW_BYTES=111

12/04/06 13:42:15 INFO mapred.JobClient: Reduce input records=1000

同时可以在控制台http://node1:50030/jobtracker.jsp

看到任务的运行情况，

运行完毕，可以执行

hadoop dfs -ls in /user/hadoop/output1

Found 3 items

-rw-r--r-- 2 hadoop supergroup 0 2012-04-06 13:42 /user/hadoop/output1/_SUCCESS

drwxr-xr-x - hadoop supergroup 0 2012-04-06 13:41 /user/hadoop/output1/_logs

-rw-r--r-- 2 hadoop supergroup 27 2012-04-06 13:42 /user/hadoop/output1/part-r-00000

生成了结果文件，

查看最终结果

hadoop dfs -cat /user/hadoop/output1/part-r-00000

2009 999

2010 129

2011 177

这是程序生成原始数据，然后手动加入2009 最大值999 可以检查结果是否正确

当然可以看看hadoop的示例，在http://hadoop.apache.org/common/docs/r0.20.2/cn/mapred_tutorial.html

分享到：

Hive与HBase | hbase安装

2012-04-06 14:04
浏览 2003
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论