What's MapReduce? - - ITeye博客

`

西口西

浏览: 10221 次

最近访客更多访客>>

zhangliang33

u013362256

javafound

博主相关

博客

微博

相册

收藏

留言

关于我

文章分类

社区版块

存档分类

最新评论

MYS_SF： ...
简单画图板

What's MapReduce?

博客分类：

云计算

阅读更多

MapReduce，包括两个单词：Map&Reduce，是对大规模数据处理过程中的两个关键阶段，Map阶段将任务分解成多个任务并行完成，Reduce阶段把各部分的成果汇总回来。整个数据处理过程简化如下图五部分。

在Mapper处理过程中，为了减少最后Reduce的工作量，map函数结束之后还有一个被称为本地Reduce的combine过程，功能跟Reduce类似，对Map的结果（中间键值对）进行了汇总处理；除此之外还有个叫shaffle的过程，本意是洗牌，功能是把map结果按照Key值分为R分，对应R个Reducer，这个过程一般使用哈希算法，即Hash（key） mod R。

各个Mapper的输出结果写入本地磁盘的不同分区（待整个MapReduce完成之后，本地磁盘中Mapper产生的中间数据可以直接删除），Reducer需要从Mapper获取自己对应的Key值区间的中间结果，执行Reduce，得到最终结果。一个Reducer对应一个输出，因此R个Reducer就有R个结果。若结果需要进一步处理，这些结果又会作为输入数据进入下一个MapReduce过程。

整个过程中，数据是以键值对的形式存在的<Key,Value>，因此解决问题首先要确定Key和Value，然后实现Map和Reduce。

查看图片附件

分享到：

GFS摘要

2015-07-14 23:16
浏览 263
评论(0)
分类:开源软件
查看更多

评论

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

藏经阁-Apache Hadoop 3.0_ What’s new in YARN & MapReduce.pdf: 藏经阁-Apache Hadoop 3.0_ What’s new in YARN & MapReduce.pdf Apache Hadoop 3.0 版本中，YARN（Yet Another Resource Negotiator）和 MapReduce 组件发生了许多变化。本文将对这些变化进行详细的介绍和分析。 ...

What’s Inside the CloudAn Architectural Map of the Cloud Landscape: 文章的目标在于探讨各种云技术和产品，范围涵盖了开源框架如Hadoop MapReduce，到亚马逊和谷歌等公司的商业服务，它们之间是如何相互关联的？从技术、软件架构和商业视角，我们应如何比较和理解这些云技术和产品？ ...

Big Data, MapReduce, Hadoop, and Spark with Python: What’s the big deal with big data? It was recently reported in the Wall Street Journal that the government is collecting so much data on its citizens that they can’t even use it effectively. A few...

Frank Kane's Taming Big Data with Apache Spark and Python 【含代码】: Use Amazon's Elastic MapReduce service to run your Spark jobs on a cluster About the Author My name is Frank Kane. I spent nine years at Amazon and IMDb, wrangling millions of customer ratings and ...

hadoop_the_definitive_guide_3nd_edition: What’s Covered in this Book 14 Compatibility 15 2. MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 A Weather Dataset...

Hadoop Succinctly: Elton Stoneman’s Hadoop Succinctly explains how Hadoop works, what goes on in the cluster, demonstrates how to move data in and out of Hadoop, and how to query it efficiently. It also walks through ...

Elasticsearch for Hadoop: Table of Contents ...Creating the MapReduce job to import data from Elasticsearch to HDFS Writing the Tweets2Hdfs mapper Running the example Testing the job execution output Summary ...

cdh hadoop官方安装文档: 2. **CDH4的新特性（WHAT'S NEW IN CDH4）** 3. **安装CDH4之前的准备工作（BEFORE YOU INSTALL CDH4 ON A CLUSTER）** 4. **支持的操作系统（SUPPORTED OPERATING SYSTEMS FOR CDH4）** 5. **CDH4安装流程（CDH4 ...

HBase.High.Performance.Cookbook.epub: A practical guide full of engaging recipes and attractive screenshots to enhance your system's performance Who This Book Is For This book is intended for developers and architects who want to know all...

Practical.Hive.A.Guide.to.Hadoops.Data.Warehouse.System.1484202724: From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical ...

Springer.The.Developer’s.Guide.to.Debugging.2008.pdf: 7.3.1 How to Determine What the Current Thread is Executing . 94 7.3.2 Analyzing the Threads of the Program . . . . . 95 7.4 Familiarize Yourself with Threading Analysis Tools . . . . . 96 7.5 ...

Machine Learning in Action.pdf: What's InsideA no-nonsense introduction Examples showing common ML tasks Everyday data analysis Implementing classic algorithms like Apriori and Adaboos =================================== Table of ...

Big Data Analytics with Spark(Apress,2016): What’s more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-...

Big Data Analytics with Spark 无水印pdf 0分: What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-...

Global site tag (gtag.js) - Google Analytics