`
西口西
  • 浏览: 10015 次
社区版块
存档分类
最新评论

What's MapReduce?

 
阅读更多

MapReduce,包括两个单词:Map&Reduce,是对大规模数据处理过程中的两个关键阶段,Map阶段将任务分解成多个任务并行完成,Reduce阶段把各部分的成果汇总回来。整个数据处理过程简化如下图五部分。


在Mapper处理过程中,为了减少最后Reduce的工作量,map函数结束之后还有一个被称为本地Reduce的combine过程,功能跟Reduce类似,对Map的结果(中间键值对)进行了汇总处理;除此之外还有个叫shaffle的过程,本意是洗牌,功能是把map结果按照Key值分为R分,对应R个Reducer,这个过程一般使用哈希算法,即Hash(key) mod R。

各个Mapper的输出结果写入本地磁盘的不同分区(待整个MapReduce完成之后,本地磁盘中Mapper产生的中间数据可以直接删除),Reducer需要从Mapper获取自己对应的Key值区间的中间结果,执行Reduce,得到最终结果。一个Reducer对应一个输出,因此R个Reducer就有R个结果。若结果需要进一步处理,这些结果又会作为输入数据进入下一个MapReduce过程。

整个过程中,数据是以键值对的形式存在的<Key,Value>,因此解决问题首先要确定Key和Value,然后实现Map和Reduce。


 

  • 大小: 36.3 KB
分享到:
评论

相关推荐

    藏经阁-Apache Hadoop 3.0_ What’s new in YARN & MapReduce.pdf

    藏经阁-Apache Hadoop 3.0_ What’s new in YARN & MapReduce.pdf Apache Hadoop 3.0 版本中,YARN(Yet Another Resource Negotiator)和 MapReduce 组件发生了许多变化。本文将对这些变化进行详细的介绍和分析。 ...

    What’s Inside the CloudAn Architectural Map of the Cloud Landscape

    文章的目标在于探讨各种云技术和产品,范围涵盖了开源框架如Hadoop MapReduce,到亚马逊和谷歌等公司的商业服务,它们之间是如何相互关联的?从技术、软件架构和商业视角,我们应如何比较和理解这些云技术和产品? ...

    Big Data, MapReduce, Hadoop, and Spark with Python

    What’s the big deal with big data? It was recently reported in the Wall Street Journal that the government is collecting so much data on its citizens that they can’t even use it effectively. A few...

    Frank Kane's Taming Big Data with Apache Spark and Python 【含代码】

    Use Amazon's Elastic MapReduce service to run your Spark jobs on a cluster About the Author My name is Frank Kane. I spent nine years at Amazon and IMDb, wrangling millions of customer ratings and ...

    hadoop_the_definitive_guide_3nd_edition

    What’s Covered in this Book 14 Compatibility 15 2. MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 A Weather Dataset...

    Hadoop Succinctly

    Elton Stoneman’s Hadoop Succinctly explains how Hadoop works, what goes on in the cluster, demonstrates how to move data in and out of Hadoop, and how to query it efficiently. It also walks through ...

    Elasticsearch for Hadoop

    Table of Contents ...Creating the MapReduce job to import data from Elasticsearch to HDFS Writing the Tweets2Hdfs mapper Running the example Testing the job execution output Summary ...

    cdh hadoop官方安装文档

    2. **CDH4的新特性(WHAT'S NEW IN CDH4)** 3. **安装CDH4之前的准备工作(BEFORE YOU INSTALL CDH4 ON A CLUSTER)** 4. **支持的操作系统(SUPPORTED OPERATING SYSTEMS FOR CDH4)** 5. **CDH4安装流程(CDH4 ...

    HBase.High.Performance.Cookbook.epub

    A practical guide full of engaging recipes and attractive screenshots to enhance your system's performance Who This Book Is For This book is intended for developers and architects who want to know all...

    Practical.Hive.A.Guide.to.Hadoops.Data.Warehouse.System.1484202724

    From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical ...

    Springer.The.Developer’s.Guide.to.Debugging.2008.pdf

    7.3.1 How to Determine What the Current Thread is Executing . 94 7.3.2 Analyzing the Threads of the Program . . . . . 95 7.4 Familiarize Yourself with Threading Analysis Tools . . . . . 96 7.5 ...

    Machine Learning in Action.pdf

    What's InsideA no-nonsense introduction Examples showing common ML tasks Everyday data analysis Implementing classic algorithms like Apriori and Adaboos =================================== Table of ...

    Big Data Analytics with Spark(Apress,2016)

    What’s more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-...

    Big Data Analytics with Spark 无水印pdf 0分

    What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-...

Global site tag (gtag.js) - Google Analytics