Homework - Benchmarking Hadoop Cluster - George's dev dream port

sunwinner

浏览: 204240 次
性别:
来自: 上海

最近访客更多访客>>

luojianbing

yanghuangsanguo

jahentao

baichoufei90sina

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Homework - Benchmarking Hadoop Cluster

博客分类：

Hadoop

In this blog post I introduce some of the benchmarking and testing tools in the Apache Hadoop distribution. Namely, I'll look at TeraSort, NNBench and MRBench. These are popular choices to benchmark a Hadoop cluster.

Before we start, let me show you the clusters on which the tests will run:

Three VMWare virtual machines (nodes) run on OS X Mountain Lion
Node1: two processors, 2GB memory, which is used as NameNode as well as DataNode
Node2: 1 processor, 1GB memory, which is used as Secondary NameNode as well as DataNodes
Node3: 1 processor, 1GB memory, which is used as DataNode

Now let's start benchmark test.

TeraSort benchmark test

A full TeraSort benchmark run consists of the following three steps:

Generating the input data via TeraGen.
Running the actual TeraSort on the input data.
Validating the sorted output data via TeraValidate.

Now let's generate the input data with:

[root@n1 lib]# hadoop jar hadoop-examples.jar teragen 1000 /user/root/terasort-input
13/07/12 21:37:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
Generating 1000 using 2 maps with step of 500
13/07/12 21:37:09 INFO mapred.JobClient: Running job: job_201307122107_0001
13/07/12 21:37:10 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 21:37:35 INFO mapred.JobClient:  map 50% reduce 0%
13/07/12 21:38:28 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 21:39:03 INFO mapred.JobClient: Job complete: job_201307122107_0001
13/07/12 21:39:05 INFO mapred.JobClient: Counters: 24
13/07/12 21:39:06 INFO mapred.JobClient:   File System Counters
13/07/12 21:39:06 INFO mapred.JobClient:     FILE: Number of bytes read=0
13/07/12 21:39:06 INFO mapred.JobClient:     FILE: Number of bytes written=309768
13/07/12 21:39:06 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 21:39:06 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 21:39:06 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 21:39:06 INFO mapred.JobClient:     HDFS: Number of bytes read=164
13/07/12 21:39:06 INFO mapred.JobClient:     HDFS: Number of bytes written=100000
13/07/12 21:39:06 INFO mapred.JobClient:     HDFS: Number of read operations=3
13/07/12 21:39:06 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 21:39:06 INFO mapred.JobClient:     HDFS: Number of write operations=2
13/07/12 21:39:06 INFO mapred.JobClient:   Job Counters 
13/07/12 21:39:06 INFO mapred.JobClient:     Launched map tasks=2
13/07/12 21:39:06 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=93872
13/07/12 21:39:06 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
13/07/12 21:39:06 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 21:39:06 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 21:39:06 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 21:39:06 INFO mapred.JobClient:     Map input records=1000
13/07/12 21:39:06 INFO mapred.JobClient:     Map output records=1000
13/07/12 21:39:06 INFO mapred.JobClient:     Input split bytes=164
13/07/12 21:39:06 INFO mapred.JobClient:     Spilled Records=0
13/07/12 21:39:06 INFO mapred.JobClient:     CPU time spent (ms)=1360
13/07/12 21:39:06 INFO mapred.JobClient:     Physical memory (bytes) snapshot=178167808
13/07/12 21:39:06 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2249502720
13/07/12 21:39:06 INFO mapred.JobClient:     Total committed heap usage (bytes)=48758784
13/07/12 21:39:06 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 21:39:06 INFO mapred.JobClient:     BYTES_READ=1000

Check the data generated:

[root@n1 lib]# hadoop fs -ls ./terasort-input
Found 4 items
-rw-r--r--   3 root supergroup          0 2013-07-12 21:38 terasort-input/_SUCCESS
drwxr-xr-x   - root supergroup          0 2013-07-12 21:37 terasort-input/_logs
-rw-r--r--   3 root supergroup      50000 2013-07-12 21:37 terasort-input/part-00000
-rw-r--r--   3 root supergroup      50000 2013-07-12 21:38 terasort-input/part-00001

Run the terasort test:

[root@n1 lib]# hadoop jar hadoop-examples.jar terasort terasort-input terasort-output
13/07/12 21:53:19 INFO terasort.TeraSort: starting
13/07/12 21:53:21 INFO mapred.FileInputFormat: Total input paths to process : 2
13/07/12 21:53:21 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/07/12 21:53:21 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Making 1 from 1000 records
Step size is 1000.0
13/07/12 21:53:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 21:53:26 INFO mapred.JobClient: Running job: job_201307122107_0002
13/07/12 21:53:27 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 21:53:46 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 21:53:57 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 21:54:01 INFO mapred.JobClient: Job complete: job_201307122107_0002
13/07/12 21:54:01 INFO mapred.JobClient: Counters: 33
13/07/12 21:54:01 INFO mapred.JobClient:   File System Counters
13/07/12 21:54:01 INFO mapred.JobClient:     FILE: Number of bytes read=23088
13/07/12 21:54:01 INFO mapred.JobClient:     FILE: Number of bytes written=520103
13/07/12 21:54:01 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 21:54:01 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 21:54:01 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 21:54:01 INFO mapred.JobClient:     HDFS: Number of bytes read=100230
13/07/12 21:54:01 INFO mapred.JobClient:     HDFS: Number of bytes written=100000
13/07/12 21:54:01 INFO mapred.JobClient:     HDFS: Number of read operations=4
13/07/12 21:54:01 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 21:54:01 INFO mapred.JobClient:     HDFS: Number of write operations=1
13/07/12 21:54:01 INFO mapred.JobClient:   Job Counters 
13/07/12 21:54:01 INFO mapred.JobClient:     Launched map tasks=2
13/07/12 21:54:01 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 21:54:01 INFO mapred.JobClient:     Data-local map tasks=2
13/07/12 21:54:01 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=26310
13/07/12 21:54:01 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=8722
13/07/12 21:54:01 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 21:54:01 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 21:54:01 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 21:54:01 INFO mapred.JobClient:     Map input records=1000
13/07/12 21:54:01 INFO mapred.JobClient:     Map output records=1000
13/07/12 21:54:01 INFO mapred.JobClient:     Map output bytes=100000
13/07/12 21:54:01 INFO mapred.JobClient:     Input split bytes=230
13/07/12 21:54:01 INFO mapred.JobClient:     Combine input records=0
13/07/12 21:54:01 INFO mapred.JobClient:     Combine output records=0
13/07/12 21:54:01 INFO mapred.JobClient:     Reduce input groups=1000
13/07/12 21:54:01 INFO mapred.JobClient:     Reduce shuffle bytes=22876
13/07/12 21:54:01 INFO mapred.JobClient:     Reduce input records=1000
13/07/12 21:54:01 INFO mapred.JobClient:     Reduce output records=1000
13/07/12 21:54:01 INFO mapred.JobClient:     Spilled Records=2000
13/07/12 21:54:01 INFO mapred.JobClient:     CPU time spent (ms)=3780
13/07/12 21:54:01 INFO mapred.JobClient:     Physical memory (bytes) snapshot=408850432
13/07/12 21:54:01 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1962823680
13/07/12 21:54:01 INFO mapred.JobClient:     Total committed heap usage (bytes)=147070976
13/07/12 21:54:01 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 21:54:01 INFO mapred.JobClient:     BYTES_READ=100000
13/07/12 21:54:01 INFO terasort.TeraSort: done

Validate job output with teravalidate:

[root@n1 lib]# hadoop jar hadoop-examples.jar teravalidate terasort-output terasort-validate
13/07/12 21:56:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 21:56:04 INFO mapred.FileInputFormat: Total input paths to process : 1
13/07/12 21:56:10 INFO mapred.JobClient: Running job: job_201307122107_0003
13/07/12 21:56:11 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 21:56:23 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 21:56:31 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 21:56:34 INFO mapred.JobClient: Job complete: job_201307122107_0003
13/07/12 21:56:34 INFO mapred.JobClient: Counters: 33
13/07/12 21:56:34 INFO mapred.JobClient:   File System Counters
13/07/12 21:56:34 INFO mapred.JobClient:     FILE: Number of bytes read=69
13/07/12 21:56:34 INFO mapred.JobClient:     FILE: Number of bytes written=310607
13/07/12 21:56:34 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 21:56:34 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 21:56:34 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 21:56:34 INFO mapred.JobClient:     HDFS: Number of bytes read=100116
13/07/12 21:56:34 INFO mapred.JobClient:     HDFS: Number of bytes written=0
13/07/12 21:56:34 INFO mapred.JobClient:     HDFS: Number of read operations=3
13/07/12 21:56:34 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 21:56:34 INFO mapred.JobClient:     HDFS: Number of write operations=2
13/07/12 21:56:34 INFO mapred.JobClient:   Job Counters 
13/07/12 21:56:34 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 21:56:34 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 21:56:34 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 21:56:34 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=14493
13/07/12 21:56:34 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=6647
13/07/12 21:56:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 21:56:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 21:56:34 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 21:56:34 INFO mapred.JobClient:     Map input records=1000
13/07/12 21:56:34 INFO mapred.JobClient:     Map output records=2
13/07/12 21:56:34 INFO mapred.JobClient:     Map output bytes=54
13/07/12 21:56:34 INFO mapred.JobClient:     Input split bytes=116
13/07/12 21:56:34 INFO mapred.JobClient:     Combine input records=0
13/07/12 21:56:34 INFO mapred.JobClient:     Combine output records=0
13/07/12 21:56:34 INFO mapred.JobClient:     Reduce input groups=2
13/07/12 21:56:34 INFO mapred.JobClient:     Reduce shuffle bytes=65
13/07/12 21:56:34 INFO mapred.JobClient:     Reduce input records=2
13/07/12 21:56:34 INFO mapred.JobClient:     Reduce output records=0
13/07/12 21:56:34 INFO mapred.JobClient:     Spilled Records=4
13/07/12 21:56:34 INFO mapred.JobClient:     CPU time spent (ms)=1640
13/07/12 21:56:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=250499072
13/07/12 21:56:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1310330880
13/07/12 21:56:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=81399808
13/07/12 21:56:34 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 21:56:34 INFO mapred.JobClient:     BYTES_READ=100000

Hadoop provides a very convenient way to access statistics about a job from the command line:

$ hadoop job -history all terasort-output

Also you can see the detailed result via Hadoop JobTracker web UI.

NameNode benchmark (nnbench)

NNBench is useful for load testing the NameNode hardware and configuration. It generates a lot of HDFS-related requests with normally very small "payloads" for the sole purpose of putting a high HDFS management stress on the NameNode. The benchmark can simulate requests for creating, reading, renaming and deleting files on HDFS.

The syntax of NNBench is as follows:

[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar nnbench
NameNode Benchmark 0.4
Usage: nnbench <options>
Options:
	-operation <Available operations are create_write open_read rename delete. This option is mandatory>
	 * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
	-maps <number of maps. default is 1. This is not mandatory>
	-reduces <number of reduces. default is 1. This is not mandatory>
	-startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory 
	-blockSize <Block size in bytes. default is 1. This is not mandatory>
	-bytesToWrite <Bytes to write. default is 0. This is not mandatory>
	-bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
	-numberOfFiles <number of files to create. default is 1. This is not mandatory>
	-replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
	-baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
	-readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
	-help: Display the help statement

To run NameNode benchmark test with 6 mappers and 3 reducers:

[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar nnbench -operation create_write -maps 6 -reduces 3 -blockSize 1 -typesToWrite 0 -numberOfFiles 100 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s`
NameNode Benchmark 0.4
13/07/12 22:13:42 INFO hdfs.NNBench: Test Inputs: 
13/07/12 22:13:42 INFO hdfs.NNBench:            Test Operation: create_write
13/07/12 22:13:42 INFO hdfs.NNBench:                Start time: 2013-07-12 22:15:42,26
13/07/12 22:13:42 INFO hdfs.NNBench:            Number of maps: 6
13/07/12 22:13:42 INFO hdfs.NNBench:         Number of reduces: 3
13/07/12 22:13:42 INFO hdfs.NNBench:                Block Size: 1
13/07/12 22:13:42 INFO hdfs.NNBench:            Bytes to write: 0
13/07/12 22:13:42 INFO hdfs.NNBench:        Bytes per checksum: 1
13/07/12 22:13:42 INFO hdfs.NNBench:           Number of files: 100
13/07/12 22:13:42 INFO hdfs.NNBench:        Replication factor: 3
13/07/12 22:13:42 INFO hdfs.NNBench:                  Base dir: /benchmarks/NNBench-n1
13/07/12 22:13:42 INFO hdfs.NNBench:      Read file after open: true
13/07/12 22:13:43 INFO hdfs.NNBench: Deleting data directory
13/07/12 22:13:43 INFO hdfs.NNBench: Creating 6 control files
13/07/12 22:13:43 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
13/07/12 22:13:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 22:13:44 INFO mapred.FileInputFormat: Total input paths to process : 6
13/07/12 22:13:44 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
13/07/12 22:13:44 INFO mapred.JobClient: Running job: job_201307122107_0005
13/07/12 22:13:45 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 22:14:03 INFO mapred.JobClient:  map 33% reduce 0%
13/07/12 22:14:05 INFO mapred.JobClient:  map 67% reduce 0%
13/07/12 22:15:57 INFO mapred.JobClient:  map 83% reduce 0%
13/07/12 22:15:58 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 22:16:07 INFO mapred.JobClient:  map 100% reduce 67%
13/07/12 22:16:09 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 22:16:11 INFO mapred.JobClient: Job complete: job_201307122107_0005
13/07/12 22:16:11 INFO mapred.JobClient: Counters: 33
13/07/12 22:16:11 INFO mapred.JobClient:   File System Counters
13/07/12 22:16:11 INFO mapred.JobClient:     FILE: Number of bytes read=359
13/07/12 22:16:11 INFO mapred.JobClient:     FILE: Number of bytes written=1448711
13/07/12 22:16:11 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 22:16:11 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 22:16:11 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 22:16:11 INFO mapred.JobClient:     HDFS: Number of bytes read=1530
13/07/12 22:16:11 INFO mapred.JobClient:     HDFS: Number of bytes written=182
13/07/12 22:16:11 INFO mapred.JobClient:     HDFS: Number of read operations=21
13/07/12 22:16:11 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 22:16:11 INFO mapred.JobClient:     HDFS: Number of write operations=4006
13/07/12 22:16:11 INFO mapred.JobClient:   Job Counters 
13/07/12 22:16:11 INFO mapred.JobClient:     Launched map tasks=6
13/07/12 22:16:11 INFO mapred.JobClient:     Launched reduce tasks=3
13/07/12 22:16:11 INFO mapred.JobClient:     Data-local map tasks=6
13/07/12 22:16:11 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=498450
13/07/12 22:16:11 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=24054
13/07/12 22:16:11 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 22:16:11 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 22:16:11 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 22:16:11 INFO mapred.JobClient:     Map input records=6
13/07/12 22:16:11 INFO mapred.JobClient:     Map output records=44
13/07/12 22:16:11 INFO mapred.JobClient:     Map output bytes=974
13/07/12 22:16:11 INFO mapred.JobClient:     Input split bytes=786
13/07/12 22:16:11 INFO mapred.JobClient:     Combine input records=0
13/07/12 22:16:11 INFO mapred.JobClient:     Combine output records=0
13/07/12 22:16:11 INFO mapred.JobClient:     Reduce input groups=8
13/07/12 22:16:11 INFO mapred.JobClient:     Reduce shuffle bytes=1227
13/07/12 22:16:11 INFO mapred.JobClient:     Reduce input records=44
13/07/12 22:16:11 INFO mapred.JobClient:     Reduce output records=8
13/07/12 22:16:11 INFO mapred.JobClient:     Spilled Records=88
13/07/12 22:16:11 INFO mapred.JobClient:     CPU time spent (ms)=16050
13/07/12 22:16:11 INFO mapred.JobClient:     Physical memory (bytes) snapshot=1233637376
13/07/12 22:16:11 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=8789716992
13/07/12 22:16:11 INFO mapred.JobClient:     Total committed heap usage (bytes)=525942784
13/07/12 22:16:11 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 22:16:11 INFO mapred.JobClient:     BYTES_READ=228
13/07/12 22:16:11 INFO hdfs.NNBench: -------------- NNBench -------------- : 
13/07/12 22:16:11 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4
13/07/12 22:16:11 INFO hdfs.NNBench:                            Date & time: 2013-07-12 22:16:11,562
13/07/12 22:16:11 INFO hdfs.NNBench: 
13/07/12 22:16:11 INFO hdfs.NNBench:                         Test Operation: create_write
13/07/12 22:16:11 INFO hdfs.NNBench:                             Start time: 2013-07-12 22:15:42,26
13/07/12 22:16:11 INFO hdfs.NNBench:                            Maps to run: 6
13/07/12 22:16:11 INFO hdfs.NNBench:                         Reduces to run: 3
13/07/12 22:16:11 INFO hdfs.NNBench:                     Block Size (bytes): 1
13/07/12 22:16:11 INFO hdfs.NNBench:                         Bytes to write: 0
13/07/12 22:16:11 INFO hdfs.NNBench:                     Bytes per checksum: 1
13/07/12 22:16:11 INFO hdfs.NNBench:                        Number of files: 100
13/07/12 22:16:11 INFO hdfs.NNBench:                     Replication factor: 3
13/07/12 22:16:11 INFO hdfs.NNBench:             Successful file operations: 0
13/07/12 22:16:11 INFO hdfs.NNBench: 
13/07/12 22:16:11 INFO hdfs.NNBench:         # maps that missed the barrier: 0
13/07/12 22:16:11 INFO hdfs.NNBench:                           # exceptions: 0
13/07/12 22:16:11 INFO hdfs.NNBench: 
13/07/12 22:16:11 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0
13/07/12 22:16:11 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.0
13/07/12 22:16:11 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN
13/07/12 22:16:11 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN
13/07/12 22:16:11 INFO hdfs.NNBench: 
13/07/12 22:16:11 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0
13/07/12 22:16:11 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0
13/07/12 22:16:11 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 0
13/07/12 22:16:11 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 0.0
13/07/12 22:16:11 INFO hdfs.NNBench:                    RAW DATA: Late maps: 0
13/07/12 22:16:11 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 0
13/07/12 22:16:11 INFO hdfs.NNBench:

Look at the trick we did here, I use a custom output directory based on the machine's short hostname `hostname -s`. This is simple trick to ensure that one box does not accidentally write into the same output directory of another machine running nnbench at the same time.

MapReduce benchmark (mrbench)

MRBench loops a small job a number of times. As such it is a very complimentary benchmark to the "large-scale" TeraSort benchmark suite because MRBench checks whether small job runs are responsive and running efficiently on your cluster. It puts its focus on the MapReduce layer as its impact on the HDFS layer is very limited.

Default parameters of mrbench is:

-baseDir: /benchmarks/MRBench  [*** see my note above ***]
-numRuns: 1
-maps: 2
-reduces: 1
-inputLines: 1
-inputType: ascending

Run mrbench with default parameters:

[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench
MRBenchmark.0.0.2
13/07/12 22:04:42 INFO mapred.MRBench: creating control file: 1 numLines, ASCENDING sortOrder
13/07/12 22:04:42 INFO mapred.MRBench: created control file: /benchmarks/MRBench/mr_input/input_-1751865361.txt
13/07/12 22:04:43 INFO mapred.MRBench: Running job 0: input=hdfs://n1.example.com:8020/benchmarks/MRBench/mr_input output=hdfs://n1.example.com:8020/benchmarks/MRBench/mr_output/output_-1484101927
13/07/12 22:04:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 22:04:44 INFO mapred.FileInputFormat: Total input paths to process : 1
13/07/12 22:04:47 INFO mapred.JobClient: Running job: job_201307122107_0004
13/07/12 22:04:49 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 22:05:41 INFO mapred.JobClient:  map 50% reduce 0%
13/07/12 22:05:48 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 22:05:58 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 22:06:00 INFO mapred.JobClient: Job complete: job_201307122107_0004
13/07/12 22:06:00 INFO mapred.JobClient: Counters: 33
13/07/12 22:06:00 INFO mapred.JobClient:   File System Counters
13/07/12 22:06:00 INFO mapred.JobClient:     FILE: Number of bytes read=27
13/07/12 22:06:00 INFO mapred.JobClient:     FILE: Number of bytes written=468313
13/07/12 22:06:00 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 22:06:00 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 22:06:00 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 22:06:00 INFO mapred.JobClient:     HDFS: Number of bytes read=261
13/07/12 22:06:00 INFO mapred.JobClient:     HDFS: Number of bytes written=3
13/07/12 22:06:00 INFO mapred.JobClient:     HDFS: Number of read operations=5
13/07/12 22:06:00 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 22:06:00 INFO mapred.JobClient:     HDFS: Number of write operations=2
13/07/12 22:06:00 INFO mapred.JobClient:   Job Counters 
13/07/12 22:06:00 INFO mapred.JobClient:     Launched map tasks=2
13/07/12 22:06:00 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 22:06:00 INFO mapred.JobClient:     Data-local map tasks=2
13/07/12 22:06:00 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=50958
13/07/12 22:06:00 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=7753
13/07/12 22:06:00 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 22:06:00 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 22:06:00 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 22:06:00 INFO mapred.JobClient:     Map input records=1
13/07/12 22:06:00 INFO mapred.JobClient:     Map output records=1
13/07/12 22:06:00 INFO mapred.JobClient:     Map output bytes=5
13/07/12 22:06:00 INFO mapred.JobClient:     Input split bytes=258
13/07/12 22:06:00 INFO mapred.JobClient:     Combine input records=0
13/07/12 22:06:00 INFO mapred.JobClient:     Combine output records=0
13/07/12 22:06:00 INFO mapred.JobClient:     Reduce input groups=1
13/07/12 22:06:00 INFO mapred.JobClient:     Reduce shuffle bytes=39
13/07/12 22:06:00 INFO mapred.JobClient:     Reduce input records=1
13/07/12 22:06:00 INFO mapred.JobClient:     Reduce output records=1
13/07/12 22:06:00 INFO mapred.JobClient:     Spilled Records=2
13/07/12 22:06:00 INFO mapred.JobClient:     CPU time spent (ms)=2920
13/07/12 22:06:00 INFO mapred.JobClient:     Physical memory (bytes) snapshot=398467072
13/07/12 22:06:00 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3889000448
13/07/12 22:06:00 INFO mapred.JobClient:     Total committed heap usage (bytes)=204607488
13/07/12 22:06:00 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/07/12 22:06:00 INFO mapred.JobClient:     BYTES_READ=2
DataLines	Maps	Reduces	AvgTime (milliseconds)
1		2	1	77797

This means that the average finish time of executed jobs was 78 seconds.

分享到：

Homework - Running Hadoop WordCount Exam ... | Introduce Guice with Example

2013-07-12 22:20
浏览 1930
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Homework - Benchmarking Hadoop Cluster

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Homework - Benchmarking Hadoop Cluster

评论

发表评论

相关推荐

Availability and Reliability with HBase

Failed to Run Pig Script with Macro

Solution to Hive Thrift Client Hang without Any Return

Hive - Load Data from CSV/TSV

如何制作Hive数据文件

Hive - 创建Index失败，原因暂未知

Cascading Terminology and Concepts

Cascading Kick Start: Word Counting

Joins with Apache Crunch

Getting Started with Apache Crunch

Blooming Filter in Hadoop

Finding Friends of Friends (FoFs)

Accelerating Comparison by Providing RawComparator

Hadoop Performance Woes Checklist

MapReduce Algorithm - Secondary Sort

MapReduce Algorithm - Semi-joins

MapReduce Algorithm - Another Way to Do Map-side Join

Running MapReduce Job with HBase

Hadoop DataJoin in Action

Adding HBase Library into Java Classpath

最近访客更多访客>>