We used Apache Hadoop
to compete in Jim Gray's Sort
benchmark. Jim's Gray's sort benchmark consists of a set of many
related benchmarks, each with their own rules. All of the sort
benchmarks measure the time to sort different numbers of 100 byte
records. The first 10 bytes of each record is the key and the rest is
the value. The minute sort
must finish end to end in less than a minute. The Gray sort
must sort more than 100 terabytes and must run for at least an hour. The best times we observed were:
Bytes
Nodes
Maps
Reduces
Replication
Time
500,000,000,000 |
1406 |
8000 |
2600 |
1 |
59 seconds |
1,000,000,000,000 |
1460 |
8000 |
2700 |
1 |
62 seconds |
100,000,000,000,000 |
3452 |
190,000 |
10,000 |
2 |
173 minutes |
1,000,000,000,000,000 |
3658 |
80,000 |
20,000 |
2 |
975 minutes |
Within the rules for the 2009 Gray sort, our 500 GB sort set a new
record for the minute sort and the 100 TB sort set a new record of
0.578 TB/minute. The 1 PB sort ran after the 2009 deadline, but
improves the speed to 1.03 TB/minute. The 62 second terabyte sort would
have set a new record, but the terabyte benchmark that we won last year
has been retired. (Clearly the minute sort and terabyte sort are
rapidly converging, and thus it is not a loss.) One piece of trivia is
that only the petabyte dataset had any duplicate keys (40 of them).
We ran our benchmarks on Yahoo's Hammer cluster. Hammer's hardware
is very similar to the hardware that we used in last year's terabyte
sort. The hardware and operating system details are:
- approximately 3800 nodes (in such a large cluster, nodes are always down)
- 2 quad core Xeons @ 2.5ghz per node
- 4 SATA disks per node
- 8G RAM per node (upgraded to 16GB before the petabyte sort)
- 1 gigabit ethernet on each node
- 40 nodes per rack
- 8 gigabit ethernet uplinks from each rack to the core
- Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18)
- Sun Java JDK (1.6.0_05-b13 and 1.6.0_13-b03) (32 and 64 bit)
We hit a JVM bug that caused a core dump in 1.6.0_05-b13 on the
larger sorts (100TB and 1PB) and switched over to the later JVM, which
resolved the issue. For the larger sorts, we used 64 bit JVMs for the
Name Node and Job Tracker.
Because the smaller sorts needed lower latency and faster network,
we only used part of the cluster for those runs. In particular, instead
of our normal 5:1 over subscription between racks, we limited it to 16
nodes in each rack for a 2:1 over subscription. The smaller runs can
also use output replication of 1, because they only take minutes to run
and run on smaller clusters, the likelihood of a node failing is fairly
low. On the larger runs, failure is expected and thus replication of 2
is required. HDFS protects against data loss during rack failure by
writing the second replica on a different rack and thus writing the
second replica is relatively slow.
Below are the timelines for the jobs counting from the job
submission at the Job Tracker. The diagrams show the number of tasks
running at each point in time. While maps only have a single phase, the
reduces have three: shuffle
, merge
, and reduce
.
The shuffle is the transfer of the data from the maps. Merge doesn't
happen in these benchmarks, because none of the reduces need multiple
levels of merges. Finally, the reduce phase is where the final merge
and writing to HDFS happens. I've also included a category named waste
that represents task attempts that were running, but ended up either
failing, or being killed (often as speculatively executed task
attempts).
If you compare this years charts to last year's, you'll notice that
tasks are launching much faster now. Last year we only launched one
task per heartbeat, so it took 40 seconds to get all of the tasks
launched. Now, Hadoop will fill up a Task Tracker in a single
heartbeat. Reducing that job launch overhead is very important for
getting runs under a minute.
As with last year, we ran with significantly larger tasks than the
defaults for Hadoop. Even with the new more aggressive shuffle,
minimizing the number of transfers (maps * reduces) is very important
to the performance of the job. Notice that in the petabyte sort, each
map is processing 15 GB instead of the default 128 MB and each reduce
is handling 50 GB. When we ran the petabyte with more typical values
1.5 GB / map, it took 40 hours to finish. Therefore, to increase
throughput, it makes sense to consider increasing the default block
size, which translates into the default map size, to at least up to 1
GB.
We used a branch of trunk with some modifications that will be
pushed back into trunk. The primary ones are that we reimplemented
shuffle to re-use connections, and we reduced latencies and made
timeouts configurable. More details including the changes we made to
Hadoop are available in our report
on the results.
-- Owen O'Malley and Arun Murthy
Posted at May 11, 2009 3:00 PM
分享到:
相关推荐
Sams Teach Yourself Hadoop in 24 Hours 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 查看此书详细信息请在美国亚马逊官网搜索此书
Hadoop 2.x is spreading its wings to cover a variety of application paradigms and solve a wider range of data problems. It is rapidly becoming a general-purpose cluster platform for all data ...
Later chapters explain the core framework components and demonstrate Hadoop in a variety of data analysis tasks. Throughout the book, readers will learn best practices and design patterns, and how to...
PART1 Hadoop - A Distributed Programming Framework CHAPTER 1 Introducing Hadoop CHAPTER 2 Starting Hadoop CHAPTER 3 Components of Hadoop PART 2 - Hadoop in Action CHAPTER 4 Writing basic MapReduce ...
Author: Jan Kunigk;Ian Buss;Paul Wilkinson; ... You'll explore the vast landscape of tools available in the Hadoop and big data realm in a thorough technical primer before diving into
Although not covered in any detail, a brief overview of additional SAS and Hadoop technologies, including DS2, high-performance analytics, SAS LASR Server, and in- memory Statistics, as well as the ...
Hadoop硬实战:Hadoop in Practice
Hadoop in Practice collects 85 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you'll face, like querying big data using Pig or writing a log ...
《Hadoop权威指南中文版(第二版)》与《Hadoop in Action》及《Pro Hadoop》这三本书是深入理解和掌握Hadoop生态系统的关键资源。Hadoop作为一个分布式计算框架,其核心是解决大规模数据处理的问题,它允许在廉价...
Hadoop in Practice collects 85 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you'll face,
Hadoop In Action 中文第二版 卷二 rar
This book jumps into the world of Hadoop ecosystem components and its tools in a simplified manner, and provides you with the skills to utilize them effectively for faster and effective development of...