`
sunwinner
  • 浏览: 202535 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Homework - Running Hadoop WordCount Examples

 
阅读更多

Hadoop workshop homework.

 

Since I am an Intellij Idea guy now (I shifted to Intellij Idea from Eclipse several months ago because Intellij Idea is much much better than Eclipse now). Currently Intellij does't have any Hadoop plugins, so I package the output into a jar file, then copy the jar (cdh4-example.jar) into Hadoop cluster with scp.

 

[root@n1 hadoop-examples]# scp gsun@192.168.1.102:~/prog/hadoop/cdh4-examples/cdh4-examples.jar .
Password:
cdh4-examples.jar 

 

 

 

First example: WordCount01. Code as below:

 

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount01 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount01.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount01(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

 

 

Output: 

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount01 Shakespeare.txt output
13/07/12 23:02:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:02:57 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:02:59 INFO mapred.JobClient: Running job: job_201307122107_0006
13/07/12 23:03:00 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:03:32 INFO mapred.JobClient:  map 26% reduce 0%
13/07/12 23:03:36 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:03:49 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:03:56 INFO mapred.JobClient: Job complete: job_201307122107_0006
13/07/12 23:03:56 INFO mapred.JobClient: Counters: 32
13/07/12 23:03:56 INFO mapred.JobClient:   File System Counters
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of bytes read=2151353
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of bytes written=2933308
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of write operations=1
13/07/12 23:03:56 INFO mapred.JobClient:   Job Counters 
13/07/12 23:03:56 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=32808
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=9469
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:03:56 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:03:56 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:03:56 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:03:56 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:03:56 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:03:56 INFO mapred.JobClient:     Combine input records=1852684
13/07/12 23:03:56 INFO mapred.JobClient:     Combine output records=306113
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce shuffle bytes=470288
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce input records=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Spilled Records=371561
13/07/12 23:03:56 INFO mapred.JobClient:     CPU time spent (ms)=9970
13/07/12 23:03:56 INFO mapred.JobClient:     Physical memory (bytes) snapshot=288149504
13/07/12 23:03:56 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3245924352
13/07/12 23:03:56 INFO mapred.JobClient:     Total committed heap usage (bytes)=142344192
Task name: word count
Tash success? Yes
Task start: 2013-07-12 23:02:53
Task end: 2013-07-12 23:03:56
Time elapsed: 1.046 minutes.

 

Note that in this example, we used a Combiner class, which is the same to the Reducer class(they must be the same). The combiner is used as local aggregator to reduce the data copied from mapper to reducer. We can see the difference between WordCount01 with WordCount02 below, from which the Combiner will be removed.

 

The second example:

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount02 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount02.class);
        job.setMapperClass(TokenizerMapper.class);

        // job.setCombinerClass(IntSumReducer.class);

        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount02(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

 Output: 

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount02 Shakespeare.txt output
13/07/12 23:16:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:16:20 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:16:24 INFO mapred.JobClient: Running job: job_201307122107_0007
13/07/12 23:16:25 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:16:42 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:17:03 INFO mapred.JobClient:  map 100% reduce 67%
13/07/12 23:17:06 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:17:12 INFO mapred.JobClient: Job complete: job_201307122107_0007
13/07/12 23:17:12 INFO mapred.JobClient: Counters: 32
13/07/12 23:17:12 INFO mapred.JobClient:   File System Counters
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of bytes read=3871328
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of bytes written=5564882
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of write operations=1
13/07/12 23:17:12 INFO mapred.JobClient:   Job Counters 
13/07/12 23:17:12 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=20788
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=20552
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:17:12 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:17:12 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:17:12 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:17:12 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:17:12 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:17:12 INFO mapred.JobClient:     Combine input records=0
13/07/12 23:17:12 INFO mapred.JobClient:     Combine output records=0
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce shuffle bytes=1382689
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce input records=1612019
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:17:12 INFO mapred.JobClient:     Spilled Records=4836057
13/07/12 23:17:12 INFO mapred.JobClient:     CPU time spent (ms)=11730
13/07/12 23:17:12 INFO mapred.JobClient:     Physical memory (bytes) snapshot=275365888
13/07/12 23:17:12 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2282356736
13/07/12 23:17:12 INFO mapred.JobClient:     Total committed heap usage (bytes)=91426816
Task name: word count
Task success? Yes
Task start: 2013-07-12 23:16:18
Task end: 2013-07-12 23:17:12
Time elapsed: 0.8969 minutes.

 

Note the difference between WordCount02 and WordCount01, in WordCount02, the combiner was removed, so the reducer input records increased, as the job output shown: 

Reduce input records=1612019 

 

In WordCount01, which was run with combiner:  

Reduce input records=65448

 

 

The third example:

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount03 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount03.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setNumReduceTasks(2);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount03(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

 Output:

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount03 Shakespeare.txt output
13/07/12 23:18:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:18:53 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:18:55 INFO mapred.JobClient: Running job: job_201307122107_0008
13/07/12 23:18:56 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:19:16 INFO mapred.JobClient:  map 70% reduce 0%
13/07/12 23:19:19 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:19:33 INFO mapred.JobClient:  map 100% reduce 50%
13/07/12 23:19:34 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:19:39 INFO mapred.JobClient: Job complete: job_201307122107_0008
13/07/12 23:19:39 INFO mapred.JobClient: Counters: 32
13/07/12 23:19:39 INFO mapred.JobClient:   File System Counters
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of bytes read=3042226
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of bytes written=3277040
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of write operations=2
13/07/12 23:19:39 INFO mapred.JobClient:   Job Counters 
13/07/12 23:19:39 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:19:39 INFO mapred.JobClient:     Launched reduce tasks=2
13/07/12 23:19:39 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=25064
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=21033
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:19:39 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:19:39 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:19:39 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:19:39 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:19:39 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:19:39 INFO mapred.JobClient:     Combine input records=1852684
13/07/12 23:19:39 INFO mapred.JobClient:     Combine output records=306113
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce shuffle bytes=503734
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce input records=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Spilled Records=371561
13/07/12 23:19:39 INFO mapred.JobClient:     CPU time spent (ms)=13840
13/07/12 23:19:39 INFO mapred.JobClient:     Physical memory (bytes) snapshot=395960320
13/07/12 23:19:39 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3918856192
13/07/12 23:19:39 INFO mapred.JobClient:     Total committed heap usage (bytes)=162201600
Task name: word count
Task success? Yes
Task start: 2013-07-12 23:18:50
Task end: 2013-07-12 23:19:39
Time elapsed: 0.80191666 minutes.

 In the third example, the number of reduce tasks was changed to 2, so there will be two reduce output file: 

-rw-r--r--   3 root supergroup     353110 2013-07-12 23:19 output/part-r-00000
-rw-r--r--   3 root supergroup     353933 2013-07-12 23:19 output/part-r-00001

In WordCount01 and WordCount02, there were only one reduce output file generated. The reason is that in my cluster, the property of mapreduce: mapred.tasktracker.reduce.tasks.maximum was set to 1, which means there will be only 1 reduce tasks that a TaskTracker can run simultaneously. But if you specify the number of reduce tasks with org.apache.hadoop.conf.Configuration, the default configuration will be overwritten by the number you specified.

 

 

分享到:
评论

相关推荐

    Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码

    Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop ...

    hadoop-1.2.1运行WordCount

    - 在Hadoop环境下执行WordCount任务,命令为`hadoop jar /usr/hadoop/hadoop-1.2.1/hadoop-examples-1.2.1.jar wordcount input output`。 #### 七、查看控制台输出及Web界面 1. **控制台输出**: - 查看...

    spark-2.0.0-bin-hadoop2.6.tgz

    本资源是spark-2.0.0-bin-hadoop2.6.tgz百度网盘资源下载,本资源是spark-2.0.0-bin-hadoop2.6.tgz百度网盘资源下载

    flink-shaded-hadoop-3下载

    flink-shaded-hadoop-3下载

    flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar.tar.gz

    在这个特定的兼容包中,我们可以看到两个文件:flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar(实际的兼容库)和._flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar(可能是Mac OS的元数据文件,通常...

    实验2-在Hadoop平台上部署WordCount程序-孙淼1

    实验2的目的是在Hadoop平台上部署WordCount程序,以此来理解和体验云计算的基础应用。这个实验主要涉及以下几个关键知识点: 1. **Linux系统基础**:实验要求学生具备Linux系统的使用知识,包括基本的命令行操作、...

    hadoop-mapreduce-examples-2.7.1.jar

    hadoop-mapreduce-examples-2.7.1.jar

    flink-shaded-hadoop-2-uber-2.7.5-10.0.jar.zip

    Apache Flink 是一个流行的开源大数据处理框架,而 `flink-shaded-hadoop-2-uber-2.7.5-10.0.jar.zip` 文件是针对 Flink 优化的一个特殊版本的 Hadoop 库。这个压缩包中的 `flink-shaded-hadoop-2-uber-2.7.5-10.0....

    flink-shaded-hadoop-2-uber-3.0.0-cdh6.2.0-7.0.jar

    # 解压命令 tar -zxvf flink-shaded-hadoop-2-uber-3.0.0-cdh6.2.0-7.0.jar.tar.gz # 介绍 用于CDH部署 Flink所依赖的jar包

    spark-3.1.3-bin-without-hadoop.tgz

    这个"spark-3.1.3-bin-without-hadoop.tgz"压缩包是Spark的3.1.3版本,不含Hadoop依赖的二进制发行版。这意味着在部署时,你需要自行配置Hadoop环境,或者在不依赖Hadoop的环境中运行Spark。 Spark的核心特性包括...

    spark-3.0.0-bin-hadoop2.7.tgz

    Spark-3.0.0-bin-hadoop2.7.tgz 是Spark 3.0.0版本的预编译二进制包,其中包含了针对Hadoop 2.7版本的兼容性构建。这个版本的发布对于数据科学家和大数据工程师来说至关重要,因为它提供了许多性能优化和新功能。 1...

    spark-3.2.0-bin-hadoop3.2.tgz

    这个压缩包"spark-3.2.0-bin-hadoop3.2.tgz"包含了Spark 3.2.0版本的二进制文件,以及针对Hadoop 3.2的兼容构建。 Spark的核心组件包括:Spark Core、Spark SQL、Spark Streaming、MLlib(机器学习库)和GraphX(图...

    hadoop-mapreduce-examples-2.6.5.jar

    hadoop-mapreduce-examples-2.6.5.jar 官方案例源码

    spark-2.4.5-bin-hadoop2.7.tar.gz

    spark-2.4.5-bin-hadoop2.7.tgz的安装包,适用ubuntu,Redhat等linux系统,解压即可安装,解压命令:tar -zxvf spark-2.4.5-bin-hadoop2.7.tar.gz -C dst(解压后存放路径)

    spark-3.1.3-bin-hadoop3.2.tgz

    在这个特定的压缩包"spark-3.1.3-bin-hadoop3.2.tgz"中,我们得到了Spark的3.1.3版本,它已经预编译为与Hadoop 3.2兼容。这个版本的Spark不仅提供了源码,还包含了预编译的二进制文件,使得在Linux环境下快速部署和...

    flink-shaded-hadoop-2-uber-2.7.5-10.0.jar

    flink-shaded-hadoop-2-uber-2.7.5-10.0.jar

    spark-2.4.0-bin-hadoop2.7.tgz

    8. **Examples**:压缩包中通常会包含一些示例代码,帮助用户快速理解和使用Spark的各种功能。 安装和配置Spark-2.4.0-bin-hadoop2.7时,你需要设置环境变量,如`SPARK_HOME`,并将Spark和Hadoop的bin目录添加到...

    spark-3.1.2-bin-hadoop3.2.tgz

    这个版本是针对Scala 2.12编译的,并且与Hadoop 3.2兼容,这意味着它可以充分利用Hadoop生态系统的最新功能。在Linux环境下,Spark可以很好地运行并与其他Hadoop组件集成。 **Spark核心概念** 1. **DAG(有向无环...

    spark-2.1.0-bin-without-hadoop版本的压缩包,直接下载到本地解压后即可使用

    在Ubuntu里安装spark,spark-2.1.0-bin-without-hadoop该版本直接下载到本地后解压即可使用。 Apache Spark 是一种用于大数据工作负载的分布式开源处理系统。它使用内存中缓存和优化的查询执行方式,可针对任何规模...

    spark-2.3.1-bin-hadoop2.7.zip

    - `examples`:包含一些Spark示例程序,可以帮助初学者了解如何使用Spark。 在使用这个压缩包时,你需要确保你的环境中已经安装了Java,并配置好Hadoop环境。然后,你可以解压这个文件,根据`conf`目录下的配置文件...

Global site tag (gtag.js) - Google Analytics