MapReduce: Simplified Data Processing on Large Clusters(摘要翻译) -

cruiser_31

浏览: 5086 次
性别:
来自: 广州

最近访客更多访客>>

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

MapReduce: Simplified Data Processing on Large Clusters(摘要翻译)

博客分类：

他山之石并行处理集群

并行处理集群 mapreduce

摘要原文如下：

MapReduce: Simplified Data Processing on Large Clusters

MapReduce is a programming model and an associ-
ated implementation for processing and generating large
data sets. Users specify a map function that processes a
key/value pair to generate a set of intermediate key/value
pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key. Many
real world tasks are expressible in this model, as shown
in the paper.

Programs written in this functional style are automati-
cally parallelized and executed on a large cluster of com-
modity machines. The run-time system takes care of the
details of partitioning the input data, scheduling the pro-
gram's execution across a set of machines, handling ma-
chine failures, and managing the required inter-machine
communication. This allows programmers without any
experience with parallel and distributed systems to eas-
ily utilize the resources of a large distributed system.
Our implementation of MapReduce runs on a large
cluster of commodity machines and is highly scalable:
a typical MapReduce computation processes many ter-
abytes of data on thousands of machines. Programmers
find the system easy to use: hundreds of MapReduce pro-
grams have been implemented and upwards of one thou-
sand MapReduce jobs are executed on Google's clusters
every day.

译文如下：

MapReduce: 大型集群上的简化数据处理
MapReduce是一个编程模型，也是一个用于处理和生成大型数据集的相关实现。用户指定一个map函数，该函数用于将key/value这样的值处理成为一个“中间”的key/value数据结构，同时一个reduce函数将所有的“中间”的value合并到同一个相关“中间”的key上。很多现实中的工作可以套用这个模型，就像这个论文中所说的。
使用这种函数式风格编写的程序会很自然地实现并行和运行在一个大型集群的商用机器上。那些运行时系统任务关心关于输入数据分区的细节，如何在一群机器上对执行的程序进行调度，处理机器故障和管理所需机器间的通信。这允许程序员即使没有任何并行和分布式系统的经验也可以轻松的利用一个大型分布式系统的资源。我们的MapReduce运行于一个大型的商用机器集群并实现高度的扩展性：一个典型的MapReduce计算（应用）部署在成千上万台机器（集群）上处理许多TB级的数据。程序员们发觉那个系统是很容易使用的：成百上千个MapReduce程序已经实现，而且每天还有一千以上的MapReduce任务已经在Google的集群上运行。

参考文献：
http://www.cs.toronto.edu/~demke/2227S.12/Papers/mapreduce-osdi04.pdf

分享到：