`
tenght
  • 浏览: 53216 次
社区版块
存档分类
最新评论

MapReduce Model

 
阅读更多

¢Programmersspecify two functions:

map (k, v) → <k’, v’>*

reduce (k’, v’) → <k’, v’>*

All values with thesame key are sent to the same reducer
¢The execution framework handles everything else…

What’s“everything else”?

MapReduce “Runtime”

¢Handlesscheduling
Assigns workers tomap and reduce tasks
¢Handles“data distribution”
Moves processes todata
¢Handlessynchronization
Gathers, sorts, andshuffles intermediate data
¢Handleserrors and faults
Detects workerfailures and restarts
¢Everythinghappens on top of a distributed FS (later)

Programmersspecify two functions:

map (k, v) → <k’, v’>*

reduce (k’, v’) → <k’, v’>*

All values with thesame key are reduced together
¢The execution framework handles everything else…
¢Not quite…usually, programmers also specify:

partition (k’, number of partitions) →partition for k’

Often a simple hashof the key, e.g., hash(k’) mod n
Divides up keyspace for parallel reduce operations

combine (k’, v’) → <k’, v’>*

Mini-reducers thatrun in memory after the map phase
Used as anoptimization to reduce network traffic




分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics