MapReduce高级编程——自定义InputFormat——深入理解

chenwq

浏览: 567276 次
性别:
来自: 济南

最近访客更多访客>>

thtf2001

u012363178

jiumoji

song0394

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

MapReduce
Hadoop

0、本文承接上文 MapReduce高级编程——自定义InputFormat

1、环境配置，本文的开发环境请直接参考基于Eclipse的Hadoop应用开发环境的配置

2、Mapper，Reducer参数解释

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

public class Point3DMapper extends Mapper<Text, Point3D, Text, Point3D>{
	protected void map(Text key, Point3D value, Context context) throws IOException, InterruptedException{
		context.write(key, value);
	}
}

进入Mapper源代码可以看到

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {

  public class Context 
    extends MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
    public Context(Configuration conf, TaskAttemptID taskid,
                   RecordReader<KEYIN,VALUEIN> reader,
                   RecordWriter<KEYOUT,VALUEOUT> writer,
                   OutputCommitter committer,
                   StatusReporter reporter,
                   InputSplit split) throws IOException, InterruptedException {
      super(conf, taskid, reader, writer, committer, reporter, split);
    }
  }
  
  /**
   * Called once at the beginning of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

  /**
   * Called once for each key/value pair in the input split. Most applications
   * should override this, but the default is the identity function.
   */
  @SuppressWarnings("unchecked")
  protected void map(KEYIN key, VALUEIN value, 
                     Context context) throws IOException, InterruptedException {
    context.write((KEYOUT) key, (VALUEOUT) value);
  }

即,/**

* Text      -> KEYIN
* Point3D  -> VALUEIN
* Text       -> KEYOUT
* Point3D  -> VALUEOUT
**/
Mapper<Text, Point3D, Text, Point3D>

而，// Text -> ball etc. -> KEYOUT

// value -> .5, 12.7, 9.0 etc. -> VALUEOUT

context.write(key, value);

同理，我们查看Reducer源代码，可以看到，Mapper的KEYOUT类型和VALUEOUT类型，必须对应Reducer的KEYIN类型和VLUEIN类型。

public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {

  public class Context 
    extends ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
    public Context(Configuration conf, TaskAttemptID taskid,
                   RawKeyValueIterator input, 
                   Counter inputKeyCounter,
                   Counter inputValueCounter,
                   RecordWriter<KEYOUT,VALUEOUT> output,
                   OutputCommitter committer,
                   StatusReporter reporter,
                   RawComparator<KEYIN> comparator,
                   Class<KEYIN> keyClass,
                   Class<VALUEIN> valueClass
                   ) throws IOException, InterruptedException {
      super(conf, taskid, input, inputKeyCounter, inputValueCounter,
            output, committer, reporter, 
            comparator, keyClass, valueClass);
    }
  }

  /**
   * Called once at the start of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

  /**
   * This method is called once for each key. Most applications will define
   * their reduce class by overriding this method. The default implementation
   * is an identity function.
   */
  @SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
  }

本文的目的是因为有些时候文档不是很清楚，而最好的方法是看源代码。“源码之前，了无秘密”。

对于MapReduce的细致执行流程，我推荐看例子经典，博客作者一流的Map Reduce – the Free Lunch is not over?。相信看完会有所收获！

0
顶

0
踩

分享到：

MapReduce高级编程之本地聚集与Combinner | MapReduce高级编程——自定义InputFormat

2012-03-10 10:38
浏览 4307
评论(0)
分类:行业应用
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

MapReduce高级编程——自定义InputFormat——深入理解

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

MapReduce高级编程——自定义InputFormat——深入理解

评论

发表评论

相关推荐

Parallel K-Means Clustering Based on MapReduce

Pagerank在Hadoop上的实现原理

Including external jars in a Hadoop job

[转]BSP模型与实例分析（一）

Hadoop中两表JOIN的处理方法

Hadoop DistributedCache

MapReduce，组合式，迭代式，链式

Hadoop ChainMap

广度优先BFS的MapReduce实现

HADOOP程序日志

TFIDF based on MapReduce

个人Hadoop 错误列表

Hadoop Map&Reduce个数优化设置以及JVM重用

有空读下

Hadoop MapReduce Job性能调优——修改Map和Reduce个数

Hadoop用于和Map Reduce作业交互的命令

Eclipse：Run on Hadoop 没有反应

Hadoop0.20+ custom MultipleOutputFormat

Custom KeyValueTextInputFormat

Hadoop SequenceFile Writer And Reader

最近访客更多访客>>