In this blog post I introduce some of the benchmarking and testing tools in the Apache Hadoop distribution. Namely, I'll look at TeraSort, NNBench and MRBench. These are popular choices to benchmark a Hadoop cluster.
Before we start, let me show you the clusters on which the tests will run:
- Three VMWare virtual machines (nodes) run on OS X Mountain Lion
- Node1: two processors, 2GB memory, which is used as NameNode as well as DataNode
- Node2: 1 processor, 1GB memory, which is used as Secondary NameNode as well as DataNodes
- Node3: 1 processor, 1GB memory, which is used as DataNode
Now let's start benchmark test.
TeraSort benchmark test
A full TeraSort benchmark run consists of the following three steps:
- Generating the input data via TeraGen.
- Running the actual TeraSort on the input data.
- Validating the sorted output data via TeraValidate.
Now let's generate the input data with:
[root@n1 lib]# hadoop jar hadoop-examples.jar teragen 1000 /user/root/terasort-input 13/07/12 21:37:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Generating 1000 using 2 maps with step of 500 13/07/12 21:37:09 INFO mapred.JobClient: Running job: job_201307122107_0001 13/07/12 21:37:10 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 21:37:35 INFO mapred.JobClient: map 50% reduce 0% 13/07/12 21:38:28 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 21:39:03 INFO mapred.JobClient: Job complete: job_201307122107_0001 13/07/12 21:39:05 INFO mapred.JobClient: Counters: 24 13/07/12 21:39:06 INFO mapred.JobClient: File System Counters 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of bytes read=0 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of bytes written=309768 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of bytes read=164 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of bytes written=100000 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of read operations=3 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of write operations=2 13/07/12 21:39:06 INFO mapred.JobClient: Job Counters 13/07/12 21:39:06 INFO mapred.JobClient: Launched map tasks=2 13/07/12 21:39:06 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=93872 13/07/12 21:39:06 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 13/07/12 21:39:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 21:39:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 21:39:06 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 21:39:06 INFO mapred.JobClient: Map input records=1000 13/07/12 21:39:06 INFO mapred.JobClient: Map output records=1000 13/07/12 21:39:06 INFO mapred.JobClient: Input split bytes=164 13/07/12 21:39:06 INFO mapred.JobClient: Spilled Records=0 13/07/12 21:39:06 INFO mapred.JobClient: CPU time spent (ms)=1360 13/07/12 21:39:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=178167808 13/07/12 21:39:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2249502720 13/07/12 21:39:06 INFO mapred.JobClient: Total committed heap usage (bytes)=48758784 13/07/12 21:39:06 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 21:39:06 INFO mapred.JobClient: BYTES_READ=1000
Check the data generated:
[root@n1 lib]# hadoop fs -ls ./terasort-input Found 4 items -rw-r--r-- 3 root supergroup 0 2013-07-12 21:38 terasort-input/_SUCCESS drwxr-xr-x - root supergroup 0 2013-07-12 21:37 terasort-input/_logs -rw-r--r-- 3 root supergroup 50000 2013-07-12 21:37 terasort-input/part-00000 -rw-r--r-- 3 root supergroup 50000 2013-07-12 21:38 terasort-input/part-00001
Run the terasort test:
[root@n1 lib]# hadoop jar hadoop-examples.jar terasort terasort-input terasort-output 13/07/12 21:53:19 INFO terasort.TeraSort: starting 13/07/12 21:53:21 INFO mapred.FileInputFormat: Total input paths to process : 2 13/07/12 21:53:21 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 13/07/12 21:53:21 INFO compress.CodecPool: Got brand-new compressor [.deflate] Making 1 from 1000 records Step size is 1000.0 13/07/12 21:53:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 21:53:26 INFO mapred.JobClient: Running job: job_201307122107_0002 13/07/12 21:53:27 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 21:53:46 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 21:53:57 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 21:54:01 INFO mapred.JobClient: Job complete: job_201307122107_0002 13/07/12 21:54:01 INFO mapred.JobClient: Counters: 33 13/07/12 21:54:01 INFO mapred.JobClient: File System Counters 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of bytes read=23088 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of bytes written=520103 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of bytes read=100230 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of bytes written=100000 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of read operations=4 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of write operations=1 13/07/12 21:54:01 INFO mapred.JobClient: Job Counters 13/07/12 21:54:01 INFO mapred.JobClient: Launched map tasks=2 13/07/12 21:54:01 INFO mapred.JobClient: Launched reduce tasks=1 13/07/12 21:54:01 INFO mapred.JobClient: Data-local map tasks=2 13/07/12 21:54:01 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=26310 13/07/12 21:54:01 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=8722 13/07/12 21:54:01 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 21:54:01 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 21:54:01 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 21:54:01 INFO mapred.JobClient: Map input records=1000 13/07/12 21:54:01 INFO mapred.JobClient: Map output records=1000 13/07/12 21:54:01 INFO mapred.JobClient: Map output bytes=100000 13/07/12 21:54:01 INFO mapred.JobClient: Input split bytes=230 13/07/12 21:54:01 INFO mapred.JobClient: Combine input records=0 13/07/12 21:54:01 INFO mapred.JobClient: Combine output records=0 13/07/12 21:54:01 INFO mapred.JobClient: Reduce input groups=1000 13/07/12 21:54:01 INFO mapred.JobClient: Reduce shuffle bytes=22876 13/07/12 21:54:01 INFO mapred.JobClient: Reduce input records=1000 13/07/12 21:54:01 INFO mapred.JobClient: Reduce output records=1000 13/07/12 21:54:01 INFO mapred.JobClient: Spilled Records=2000 13/07/12 21:54:01 INFO mapred.JobClient: CPU time spent (ms)=3780 13/07/12 21:54:01 INFO mapred.JobClient: Physical memory (bytes) snapshot=408850432 13/07/12 21:54:01 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1962823680 13/07/12 21:54:01 INFO mapred.JobClient: Total committed heap usage (bytes)=147070976 13/07/12 21:54:01 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 21:54:01 INFO mapred.JobClient: BYTES_READ=100000 13/07/12 21:54:01 INFO terasort.TeraSort: done
Validate job output with teravalidate:
[root@n1 lib]# hadoop jar hadoop-examples.jar teravalidate terasort-output terasort-validate 13/07/12 21:56:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 21:56:04 INFO mapred.FileInputFormat: Total input paths to process : 1 13/07/12 21:56:10 INFO mapred.JobClient: Running job: job_201307122107_0003 13/07/12 21:56:11 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 21:56:23 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 21:56:31 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 21:56:34 INFO mapred.JobClient: Job complete: job_201307122107_0003 13/07/12 21:56:34 INFO mapred.JobClient: Counters: 33 13/07/12 21:56:34 INFO mapred.JobClient: File System Counters 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of bytes read=69 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of bytes written=310607 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of bytes read=100116 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of bytes written=0 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of read operations=3 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of write operations=2 13/07/12 21:56:34 INFO mapred.JobClient: Job Counters 13/07/12 21:56:34 INFO mapred.JobClient: Launched map tasks=1 13/07/12 21:56:34 INFO mapred.JobClient: Launched reduce tasks=1 13/07/12 21:56:34 INFO mapred.JobClient: Data-local map tasks=1 13/07/12 21:56:34 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=14493 13/07/12 21:56:34 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=6647 13/07/12 21:56:34 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 21:56:34 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 21:56:34 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 21:56:34 INFO mapred.JobClient: Map input records=1000 13/07/12 21:56:34 INFO mapred.JobClient: Map output records=2 13/07/12 21:56:34 INFO mapred.JobClient: Map output bytes=54 13/07/12 21:56:34 INFO mapred.JobClient: Input split bytes=116 13/07/12 21:56:34 INFO mapred.JobClient: Combine input records=0 13/07/12 21:56:34 INFO mapred.JobClient: Combine output records=0 13/07/12 21:56:34 INFO mapred.JobClient: Reduce input groups=2 13/07/12 21:56:34 INFO mapred.JobClient: Reduce shuffle bytes=65 13/07/12 21:56:34 INFO mapred.JobClient: Reduce input records=2 13/07/12 21:56:34 INFO mapred.JobClient: Reduce output records=0 13/07/12 21:56:34 INFO mapred.JobClient: Spilled Records=4 13/07/12 21:56:34 INFO mapred.JobClient: CPU time spent (ms)=1640 13/07/12 21:56:34 INFO mapred.JobClient: Physical memory (bytes) snapshot=250499072 13/07/12 21:56:34 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1310330880 13/07/12 21:56:34 INFO mapred.JobClient: Total committed heap usage (bytes)=81399808 13/07/12 21:56:34 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 21:56:34 INFO mapred.JobClient: BYTES_READ=100000
Hadoop provides a very convenient way to access statistics about a job from the command line:
$ hadoop job -history all terasort-output
Also you can see the detailed result via Hadoop JobTracker web UI.
NameNode benchmark (nnbench)
NNBench is useful for load testing the NameNode hardware and configuration. It generates a lot of HDFS-related requests with normally very small "payloads" for the sole purpose of putting a high HDFS management stress on the NameNode. The benchmark can simulate requests for creating, reading, renaming and deleting files on HDFS.
The syntax of NNBench is as follows:
[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar nnbench NameNode Benchmark 0.4 Usage: nnbench <options> Options: -operation <Available operations are create_write open_read rename delete. This option is mandatory> * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations. -maps <number of maps. default is 1. This is not mandatory> -reduces <number of reduces. default is 1. This is not mandatory> -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory -blockSize <Block size in bytes. default is 1. This is not mandatory> -bytesToWrite <Bytes to write. default is 0. This is not mandatory> -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory> -numberOfFiles <number of files to create. default is 1. This is not mandatory> -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory> -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory> -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory> -help: Display the help statement
To run NameNode benchmark test with 6 mappers and 3 reducers:
[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar nnbench -operation create_write -maps 6 -reduces 3 -blockSize 1 -typesToWrite 0 -numberOfFiles 100 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s` NameNode Benchmark 0.4 13/07/12 22:13:42 INFO hdfs.NNBench: Test Inputs: 13/07/12 22:13:42 INFO hdfs.NNBench: Test Operation: create_write 13/07/12 22:13:42 INFO hdfs.NNBench: Start time: 2013-07-12 22:15:42,26 13/07/12 22:13:42 INFO hdfs.NNBench: Number of maps: 6 13/07/12 22:13:42 INFO hdfs.NNBench: Number of reduces: 3 13/07/12 22:13:42 INFO hdfs.NNBench: Block Size: 1 13/07/12 22:13:42 INFO hdfs.NNBench: Bytes to write: 0 13/07/12 22:13:42 INFO hdfs.NNBench: Bytes per checksum: 1 13/07/12 22:13:42 INFO hdfs.NNBench: Number of files: 100 13/07/12 22:13:42 INFO hdfs.NNBench: Replication factor: 3 13/07/12 22:13:42 INFO hdfs.NNBench: Base dir: /benchmarks/NNBench-n1 13/07/12 22:13:42 INFO hdfs.NNBench: Read file after open: true 13/07/12 22:13:43 INFO hdfs.NNBench: Deleting data directory 13/07/12 22:13:43 INFO hdfs.NNBench: Creating 6 control files 13/07/12 22:13:43 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 13/07/12 22:13:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 22:13:44 INFO mapred.FileInputFormat: Total input paths to process : 6 13/07/12 22:13:44 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 13/07/12 22:13:44 INFO mapred.JobClient: Running job: job_201307122107_0005 13/07/12 22:13:45 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 22:14:03 INFO mapred.JobClient: map 33% reduce 0% 13/07/12 22:14:05 INFO mapred.JobClient: map 67% reduce 0% 13/07/12 22:15:57 INFO mapred.JobClient: map 83% reduce 0% 13/07/12 22:15:58 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 22:16:07 INFO mapred.JobClient: map 100% reduce 67% 13/07/12 22:16:09 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 22:16:11 INFO mapred.JobClient: Job complete: job_201307122107_0005 13/07/12 22:16:11 INFO mapred.JobClient: Counters: 33 13/07/12 22:16:11 INFO mapred.JobClient: File System Counters 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of bytes read=359 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of bytes written=1448711 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of bytes read=1530 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of bytes written=182 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of read operations=21 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of write operations=4006 13/07/12 22:16:11 INFO mapred.JobClient: Job Counters 13/07/12 22:16:11 INFO mapred.JobClient: Launched map tasks=6 13/07/12 22:16:11 INFO mapred.JobClient: Launched reduce tasks=3 13/07/12 22:16:11 INFO mapred.JobClient: Data-local map tasks=6 13/07/12 22:16:11 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=498450 13/07/12 22:16:11 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=24054 13/07/12 22:16:11 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 22:16:11 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 22:16:11 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 22:16:11 INFO mapred.JobClient: Map input records=6 13/07/12 22:16:11 INFO mapred.JobClient: Map output records=44 13/07/12 22:16:11 INFO mapred.JobClient: Map output bytes=974 13/07/12 22:16:11 INFO mapred.JobClient: Input split bytes=786 13/07/12 22:16:11 INFO mapred.JobClient: Combine input records=0 13/07/12 22:16:11 INFO mapred.JobClient: Combine output records=0 13/07/12 22:16:11 INFO mapred.JobClient: Reduce input groups=8 13/07/12 22:16:11 INFO mapred.JobClient: Reduce shuffle bytes=1227 13/07/12 22:16:11 INFO mapred.JobClient: Reduce input records=44 13/07/12 22:16:11 INFO mapred.JobClient: Reduce output records=8 13/07/12 22:16:11 INFO mapred.JobClient: Spilled Records=88 13/07/12 22:16:11 INFO mapred.JobClient: CPU time spent (ms)=16050 13/07/12 22:16:11 INFO mapred.JobClient: Physical memory (bytes) snapshot=1233637376 13/07/12 22:16:11 INFO mapred.JobClient: Virtual memory (bytes) snapshot=8789716992 13/07/12 22:16:11 INFO mapred.JobClient: Total committed heap usage (bytes)=525942784 13/07/12 22:16:11 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 22:16:11 INFO mapred.JobClient: BYTES_READ=228 13/07/12 22:16:11 INFO hdfs.NNBench: -------------- NNBench -------------- : 13/07/12 22:16:11 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4 13/07/12 22:16:11 INFO hdfs.NNBench: Date & time: 2013-07-12 22:16:11,562 13/07/12 22:16:11 INFO hdfs.NNBench: 13/07/12 22:16:11 INFO hdfs.NNBench: Test Operation: create_write 13/07/12 22:16:11 INFO hdfs.NNBench: Start time: 2013-07-12 22:15:42,26 13/07/12 22:16:11 INFO hdfs.NNBench: Maps to run: 6 13/07/12 22:16:11 INFO hdfs.NNBench: Reduces to run: 3 13/07/12 22:16:11 INFO hdfs.NNBench: Block Size (bytes): 1 13/07/12 22:16:11 INFO hdfs.NNBench: Bytes to write: 0 13/07/12 22:16:11 INFO hdfs.NNBench: Bytes per checksum: 1 13/07/12 22:16:11 INFO hdfs.NNBench: Number of files: 100 13/07/12 22:16:11 INFO hdfs.NNBench: Replication factor: 3 13/07/12 22:16:11 INFO hdfs.NNBench: Successful file operations: 0 13/07/12 22:16:11 INFO hdfs.NNBench: 13/07/12 22:16:11 INFO hdfs.NNBench: # maps that missed the barrier: 0 13/07/12 22:16:11 INFO hdfs.NNBench: # exceptions: 0 13/07/12 22:16:11 INFO hdfs.NNBench: 13/07/12 22:16:11 INFO hdfs.NNBench: TPS: Create/Write/Close: 0 13/07/12 22:16:11 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.0 13/07/12 22:16:11 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 13/07/12 22:16:11 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN 13/07/12 22:16:11 INFO hdfs.NNBench: 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 0.0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: Late maps: 0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: # of exceptions: 0 13/07/12 22:16:11 INFO hdfs.NNBench:
Look at the trick we did here, I use a custom output directory based on the machine's short hostname `hostname -s`. This is simple trick to ensure that one box does not accidentally write into the same output directory of another machine running nnbench at the same time.
MapReduce benchmark (mrbench)
MRBench loops a small job a number of times. As such it is a very complimentary benchmark to the "large-scale" TeraSort benchmark suite because MRBench checks whether small job runs are responsive and running efficiently on your cluster. It puts its focus on the MapReduce layer as its impact on the HDFS layer is very limited.
Default parameters of mrbench is:
-baseDir: /benchmarks/MRBench [*** see my note above ***] -numRuns: 1 -maps: 2 -reduces: 1 -inputLines: 1 -inputType: ascending
Run mrbench with default parameters:
[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench MRBenchmark.0.0.2 13/07/12 22:04:42 INFO mapred.MRBench: creating control file: 1 numLines, ASCENDING sortOrder 13/07/12 22:04:42 INFO mapred.MRBench: created control file: /benchmarks/MRBench/mr_input/input_-1751865361.txt 13/07/12 22:04:43 INFO mapred.MRBench: Running job 0: input=hdfs://n1.example.com:8020/benchmarks/MRBench/mr_input output=hdfs://n1.example.com:8020/benchmarks/MRBench/mr_output/output_-1484101927 13/07/12 22:04:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 22:04:44 INFO mapred.FileInputFormat: Total input paths to process : 1 13/07/12 22:04:47 INFO mapred.JobClient: Running job: job_201307122107_0004 13/07/12 22:04:49 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 22:05:41 INFO mapred.JobClient: map 50% reduce 0% 13/07/12 22:05:48 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 22:05:58 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 22:06:00 INFO mapred.JobClient: Job complete: job_201307122107_0004 13/07/12 22:06:00 INFO mapred.JobClient: Counters: 33 13/07/12 22:06:00 INFO mapred.JobClient: File System Counters 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of bytes read=27 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of bytes written=468313 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of bytes read=261 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of bytes written=3 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of read operations=5 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of write operations=2 13/07/12 22:06:00 INFO mapred.JobClient: Job Counters 13/07/12 22:06:00 INFO mapred.JobClient: Launched map tasks=2 13/07/12 22:06:00 INFO mapred.JobClient: Launched reduce tasks=1 13/07/12 22:06:00 INFO mapred.JobClient: Data-local map tasks=2 13/07/12 22:06:00 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=50958 13/07/12 22:06:00 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=7753 13/07/12 22:06:00 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 22:06:00 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 22:06:00 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 22:06:00 INFO mapred.JobClient: Map input records=1 13/07/12 22:06:00 INFO mapred.JobClient: Map output records=1 13/07/12 22:06:00 INFO mapred.JobClient: Map output bytes=5 13/07/12 22:06:00 INFO mapred.JobClient: Input split bytes=258 13/07/12 22:06:00 INFO mapred.JobClient: Combine input records=0 13/07/12 22:06:00 INFO mapred.JobClient: Combine output records=0 13/07/12 22:06:00 INFO mapred.JobClient: Reduce input groups=1 13/07/12 22:06:00 INFO mapred.JobClient: Reduce shuffle bytes=39 13/07/12 22:06:00 INFO mapred.JobClient: Reduce input records=1 13/07/12 22:06:00 INFO mapred.JobClient: Reduce output records=1 13/07/12 22:06:00 INFO mapred.JobClient: Spilled Records=2 13/07/12 22:06:00 INFO mapred.JobClient: CPU time spent (ms)=2920 13/07/12 22:06:00 INFO mapred.JobClient: Physical memory (bytes) snapshot=398467072 13/07/12 22:06:00 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3889000448 13/07/12 22:06:00 INFO mapred.JobClient: Total committed heap usage (bytes)=204607488 13/07/12 22:06:00 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 22:06:00 INFO mapred.JobClient: BYTES_READ=2 DataLines Maps Reduces AvgTime (milliseconds) 1 2 1 77797
This means that the average finish time of executed jobs was 78 seconds.
相关推荐
开源项目-gilbertchen-benchmarking.zip,A performance comparison of Duplicacy, restic, Attic, and duplicity
在“json-serialization-benchmarking-master”目录下,可能会包含以下内容: - 测试代码:实现了一组标准的JSON序列化和反序列化任务,用于测试每个库的性能。 - 结果记录:保存了每次运行测试的结果,可能以图表...
如果您发现该存储库对您的工作有用,请对以下论文进行引用,我们将不胜感激:用法请首先将此库克隆到本地系统: git clone https://github.com/jhcknzzm/SSFL-Benchmarking-Semi-supervised-Federated-Learning.git...
cd python-template-benchmarking 用法 nox 贡献 请随时提交拉取请求和/或未解决的问题。 执照 乔纳森·鲍曼(Jonathan Bowman)版权所有2020。 该存储库中包含的所有文档和代码都可以自由共享,并按“原样”提供,...
官方离线安装包,亲测可用
下面我们将详细探讨突变基准测试、Perl编程语言以及这个特定的项目“driver-benchmarking-ver3”。 **突变基准测试**: 突变基准测试是一种静态代码分析技术,它的核心思想是通过在源代码中插入小的修改(突变),...
基准测试 不推荐使用:使用来运行基准测试。 有关更多信息,请参见相关。 结果: : 用法: go run main.go subject [test_latency] [num_messages] [message_size] subject: inproc,zeromq,nanomsg,红est,...
websocket-benchmarking NodeJS Websocket Benchmarking Socket.IO(On MAC OS X) 配置 App, Nginx cd socket.io/chat1 npm install cd socket.io/chat2 npm install nginx.conf 修改: 由于测试时所有 client 都...
wrk-node-benchmarking Wrk node.js http基准测试工具。 wrk基准测试的框架包装。 而是在全局运行它。 这将在gulp任务运行器中实现。 跑步 ./node_modules/.bin/wrk <options> 用法 'use strict' ; var wrkNode ...
一般信息 GIAB组装基准测试管道(GABP)提供...将工作流程克隆到工作目录中git clone https://github.com/usnistgov/giab-asm-benchmarking.git path/to/workdir cd path/to/workdir 根据需要调整用户输入(请参阅“管
In this whitepaper learn the fundamentals of how to design and select the proper components for a successful MySQL Cluster evaluation. We explore hardware, networking and software requirements, and ...
CF 基准测试的 Spring Music 使用作为示例应用程序,对 CF 和 Linux 部署上的应用程序性能进行基准测试。 测试环境 使用 spring-music 作为被测应用程序 主要监控应用程序负载 ...应用程序配置为使用外部 mysql 数据...
基于FPGA的机器学习加速 我们处于3至4个阶段。 当前阶段是小型化-建立节能的电子和软件代理以及先进的传感器融合,从而允许多个互连的对象像一台机器一样工作。 计算正的边缘发展,这主要是由于数据的重要性(带宽...
src :基准测试框架源代码和用于执行实验的构建脚本implementations :从公开来源收集的第二轮候选的实施benchmarks :基准结果实作有关实现的更多信息,请参见。平台类当前,基准测试正在以下开发板上执行: 结果...
dotnet run --configuration Release --project AWSSamples.Jsii.Benchmarking/AWSSamples.Jsii.Benchmarking.csproj 结果 在jsii和本机实现上运行的输出: 从具有4个核心和8个GiB RAM的AWS Workspaces VM运行...
HDAD-iGAV数据集与深度学习技术在人体活动识别(HAR)中的应用 人体活动识别(Human Activity Recognition,HAR)是计算机视觉和移动计算领域的一个关键研究方向,它涉及通过传感器数据来理解人的行为。...
"Indian-Dataset-Benchmarking"项目是一个由IIT Delhi(印度理工学院德里分校)和Bharati Vidyapeeth工程学院的学生合作创建的独特数据集。这个项目专注于提供针对印度特定环境的数据,用于推动学术研究和技术创新。...