Homework - Running Hadoop WordCount Examples - George's dev dream port

sunwinner

浏览: 204253 次
性别:
来自: 上海

最近访客更多访客>>

luojianbing

yanghuangsanguo

jahentao

baichoufei90sina

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Homework - Running Hadoop WordCount Examples

博客分类：

Hadoop

Hadoop workshop homework.

Since I am an Intellij Idea guy now (I shifted to Intellij Idea from Eclipse several months ago because Intellij Idea is much much better than Eclipse now). Currently Intellij does't have any Hadoop plugins, so I package the output into a jar file, then copy the jar (cdh4-example.jar) into Hadoop cluster with scp.

[root@n1 hadoop-examples]# scp gsun@192.168.1.102:~/prog/hadoop/cdh4-examples/cdh4-examples.jar .
Password:
cdh4-examples.jar

First example: WordCount01. Code as below:

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount01 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount01.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount01(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

Output:

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount01 Shakespeare.txt output
13/07/12 23:02:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:02:57 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:02:59 INFO mapred.JobClient: Running job: job_201307122107_0006
13/07/12 23:03:00 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:03:32 INFO mapred.JobClient:  map 26% reduce 0%
13/07/12 23:03:36 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:03:49 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:03:56 INFO mapred.JobClient: Job complete: job_201307122107_0006
13/07/12 23:03:56 INFO mapred.JobClient: Counters: 32
13/07/12 23:03:56 INFO mapred.JobClient:   File System Counters
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of bytes read=2151353
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of bytes written=2933308
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of write operations=1
13/07/12 23:03:56 INFO mapred.JobClient:   Job Counters 
13/07/12 23:03:56 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=32808
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=9469
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:03:56 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:03:56 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:03:56 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:03:56 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:03:56 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:03:56 INFO mapred.JobClient:     Combine input records=1852684
13/07/12 23:03:56 INFO mapred.JobClient:     Combine output records=306113
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce shuffle bytes=470288
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce input records=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Spilled Records=371561
13/07/12 23:03:56 INFO mapred.JobClient:     CPU time spent (ms)=9970
13/07/12 23:03:56 INFO mapred.JobClient:     Physical memory (bytes) snapshot=288149504
13/07/12 23:03:56 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3245924352
13/07/12 23:03:56 INFO mapred.JobClient:     Total committed heap usage (bytes)=142344192
Task name: word count
Tash success? Yes
Task start: 2013-07-12 23:02:53
Task end: 2013-07-12 23:03:56
Time elapsed: 1.046 minutes.

Note that in this example, we used a Combiner class, which is the same to the Reducer class(they must be the same). The combiner is used as local aggregator to reduce the data copied from mapper to reducer. We can see the difference between WordCount01 with WordCount02 below, from which the Combiner will be removed.

The second example:

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount02 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount02.class);
        job.setMapperClass(TokenizerMapper.class);

        // job.setCombinerClass(IntSumReducer.class);

        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount02(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

Output:

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount02 Shakespeare.txt output
13/07/12 23:16:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:16:20 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:16:24 INFO mapred.JobClient: Running job: job_201307122107_0007
13/07/12 23:16:25 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:16:42 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:17:03 INFO mapred.JobClient:  map 100% reduce 67%
13/07/12 23:17:06 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:17:12 INFO mapred.JobClient: Job complete: job_201307122107_0007
13/07/12 23:17:12 INFO mapred.JobClient: Counters: 32
13/07/12 23:17:12 INFO mapred.JobClient:   File System Counters
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of bytes read=3871328
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of bytes written=5564882
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of write operations=1
13/07/12 23:17:12 INFO mapred.JobClient:   Job Counters 
13/07/12 23:17:12 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=20788
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=20552
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:17:12 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:17:12 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:17:12 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:17:12 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:17:12 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:17:12 INFO mapred.JobClient:     Combine input records=0
13/07/12 23:17:12 INFO mapred.JobClient:     Combine output records=0
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce shuffle bytes=1382689
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce input records=1612019
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:17:12 INFO mapred.JobClient:     Spilled Records=4836057
13/07/12 23:17:12 INFO mapred.JobClient:     CPU time spent (ms)=11730
13/07/12 23:17:12 INFO mapred.JobClient:     Physical memory (bytes) snapshot=275365888
13/07/12 23:17:12 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2282356736
13/07/12 23:17:12 INFO mapred.JobClient:     Total committed heap usage (bytes)=91426816
Task name: word count
Task success? Yes
Task start: 2013-07-12 23:16:18
Task end: 2013-07-12 23:17:12
Time elapsed: 0.8969 minutes.

Note the difference between WordCount02 and WordCount01, in WordCount02, the combiner was removed, so the reducer input records increased, as the job output shown:

Reduce input records=1612019

In WordCount01, which was run with combiner:

Reduce input records=65448

The third example:

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount03 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount03.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setNumReduceTasks(2);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount03(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

Output:

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount03 Shakespeare.txt output
13/07/12 23:18:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:18:53 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:18:55 INFO mapred.JobClient: Running job: job_201307122107_0008
13/07/12 23:18:56 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:19:16 INFO mapred.JobClient:  map 70% reduce 0%
13/07/12 23:19:19 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:19:33 INFO mapred.JobClient:  map 100% reduce 50%
13/07/12 23:19:34 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:19:39 INFO mapred.JobClient: Job complete: job_201307122107_0008
13/07/12 23:19:39 INFO mapred.JobClient: Counters: 32
13/07/12 23:19:39 INFO mapred.JobClient:   File System Counters
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of bytes read=3042226
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of bytes written=3277040
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of write operations=2
13/07/12 23:19:39 INFO mapred.JobClient:   Job Counters 
13/07/12 23:19:39 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:19:39 INFO mapred.JobClient:     Launched reduce tasks=2
13/07/12 23:19:39 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=25064
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=21033
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:19:39 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:19:39 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:19:39 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:19:39 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:19:39 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:19:39 INFO mapred.JobClient:     Combine input records=1852684
13/07/12 23:19:39 INFO mapred.JobClient:     Combine output records=306113
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce shuffle bytes=503734
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce input records=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Spilled Records=371561
13/07/12 23:19:39 INFO mapred.JobClient:     CPU time spent (ms)=13840
13/07/12 23:19:39 INFO mapred.JobClient:     Physical memory (bytes) snapshot=395960320
13/07/12 23:19:39 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3918856192
13/07/12 23:19:39 INFO mapred.JobClient:     Total committed heap usage (bytes)=162201600
Task name: word count
Task success? Yes
Task start: 2013-07-12 23:18:50
Task end: 2013-07-12 23:19:39
Time elapsed: 0.80191666 minutes.

In the third example, the number of reduce tasks was changed to 2, so there will be two reduce output file:

-rw-r--r--   3 root supergroup     353110 2013-07-12 23:19 output/part-r-00000
-rw-r--r--   3 root supergroup     353933 2013-07-12 23:19 output/part-r-00001

In WordCount01 and WordCount02, there were only one reduce output file generated. The reason is that in my cluster, the property of mapreduce: mapred.tasktracker.reduce.tasks.maximum was set to 1, which means there will be only 1 reduce tasks that a TaskTracker can run simultaneously. But if you specify the number of reduce tasks with org.apache.hadoop.conf.Configuration, the default configuration will be overwritten by the number you specified.

分享到：

Homework - NASA Access Log Processing | Homework - Benchmarking Hadoop Cluster

2013-07-12 23:44
浏览 1656
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Homework - Running Hadoop WordCount Examples

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Homework - Running Hadoop WordCount Examples

评论

发表评论

相关推荐

Availability and Reliability with HBase

Failed to Run Pig Script with Macro

Solution to Hive Thrift Client Hang without Any Return

Hive - Load Data from CSV/TSV

如何制作Hive数据文件

Hive - 创建Index失败，原因暂未知

Cascading Terminology and Concepts

Cascading Kick Start: Word Counting

Joins with Apache Crunch

Getting Started with Apache Crunch

Blooming Filter in Hadoop

Finding Friends of Friends (FoFs)

Accelerating Comparison by Providing RawComparator

Hadoop Performance Woes Checklist

MapReduce Algorithm - Secondary Sort

MapReduce Algorithm - Semi-joins

MapReduce Algorithm - Another Way to Do Map-side Join

Running MapReduce Job with HBase

Hadoop DataJoin in Action

Adding HBase Library into Java Classpath

最近访客更多访客>>