MapReduce: Simplified Data Processing on Large Clusters
MapReduce is a programming model and an associ-
ated implementation for processing and generating large
data sets. Users specify a map function that processes a
key/value pair to generate a set of intermediate key/value
pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key. Many
real world tasks are expressible in this model, as shown
in the paper.
Programs written in this functional style are automati-
cally parallelized and executed on a large cluster of com-
modity machines. The run-time system takes care of the
details of partitioning the input data, scheduling the pro-
gram's execution across a set of machines, handling ma-
chine failures, and managing the required inter-machine
communication. This allows programmers without any
experience with parallel and distributed systems to eas-
ily utilize the resources of a large distributed system.
Our implementation of MapReduce runs on a large
cluster of commodity machines and is highly scalable:
a typical MapReduce computation processes many ter-
abytes of data on thousands of machines. Programmers
find the system easy to use: hundreds of MapReduce pro-
grams have been implemented and upwards of one thou-
sand MapReduce jobs are executed on Google's clusters
every day.
MapReduce: 大型集群上的简化数据处理
这是谷歌三大论文之一的 MapReduce: Simplified Data Processing on Large Clusters 英文原文。我的翻译可以见https://blog.csdn.net/m0_37809890/article/details/87830686
《MapReduce: Simplified Data Processing on Large Clusters》这篇论文由Google的研究员Jeffrey Dean和Sanjay Ghemawat撰写,旨在介绍一种名为MapReduce的分布式计算模型。在MapReduce出现之前,Google和其他公司...
MapReduce-Simplified Data Processing on Large Clusters.pdf MapReduce-Simplified Data Processing on Large Clusters.pdf
MapReduce 模型的应用非常广泛,例如在 Google 的集群上执行 MapReduce 任务,用于处理大量数据的计算,例如降序索引、图示展示的 web 文档、蠕虫采集的每个 host 的 page 数量摘要等等。该模型也可以用于其他领域,...
### MapReduce: 简化大型集群上的数据处理 #### 引言 本文档由Google的两位工程师Jeffrey Dean和Sanjay Ghemawat撰写,旨在介绍一种名为MapReduce的编程模型及其在处理大规模数据集时的应用。MapReduce是Google为...
### MapReduce:简化大型集群上的数据处理 #### 概述 MapReduce是一种高效的数据处理模型,主要用于处理和生成大规模数据集。它通过将数据处理任务分解为“映射(Map)”和“归并(Reduce)”两个阶段,极大地简化...
MapReduce programming model MapReduce是Google公司开发的一种编程模型和实现方法,用于处理和生成大规模数据集。该模型允许用户指定一个Map函数,以处理键值对,并生成中间键值对;然后,指定一个Reduce函数,以...
Sanjay Ghemawat published the seminal paper MapReduce: Simplified Data Processing on Large Clusters. Since then, technologies leveraging the concept started growing very quickly with Apache Hadoop ...
开源如此繁荣,需要感谢Google的三篇论文:《The Google File System》、《MapReduce: Simplified Data Processing on Large Clusters》和《Bigtable: A Distributed Storage System for Structured Data》,Google...
谷歌的三大论文——《MapReduce: Simplified Data Processing on Large Clusters》、《The Google File System》和《Bigtable: A Distributed Storage System for Structured Data》是大数据处理领域的重要里程碑,...
在IT领域,特别是大数据处理和分布式计算中,Google的三篇标志性论文——"MapReduce: Simplified Data Processing on Large Clusters"(2004年OSDI会议)、"The Google File System"(2003年SOSP会议)以及"Bigtable...
3. Google Lab: MapReduce: Simplified Data Processing on http://highscalability.com/google-architecture http://weibo.com/developerworks 2012-11-11 整理 第 1/9页 Large Clusters 4. Google Lab: BigTable...
[2]MapReduce: Simplified Data Processing on Large Clusters [3]The Google File System [4]Large-scale Incremental Processing Using Distributed Transactions and Notifications [5]Dremel: Interactive ...