In this blog post I introduce some of the benchmarking and testing tools in the Apache Hadoop distribution. Namely, I'll look at TeraSort, NNBench and MRBench. These are popular choices to benchmark a Hadoop cluster.
Before we start, let me show you the clusters on which the tests will run:
- Three VMWare virtual machines (nodes) run on OS X Mountain Lion
- Node1: two processors, 2GB memory, which is used as NameNode as well as DataNode
- Node2: 1 processor, 1GB memory, which is used as Secondary NameNode as well as DataNodes
- Node3: 1 processor, 1GB memory, which is used as DataNode
Now let's start benchmark test.
TeraSort benchmark test
A full TeraSort benchmark run consists of the following three steps:
- Generating the input data via TeraGen.
- Running the actual TeraSort on the input data.
- Validating the sorted output data via TeraValidate.
Now let's generate the input data with:
[root@n1 lib]# hadoop jar hadoop-examples.jar teragen 1000 /user/root/terasort-input 13/07/12 21:37:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Generating 1000 using 2 maps with step of 500 13/07/12 21:37:09 INFO mapred.JobClient: Running job: job_201307122107_0001 13/07/12 21:37:10 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 21:37:35 INFO mapred.JobClient: map 50% reduce 0% 13/07/12 21:38:28 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 21:39:03 INFO mapred.JobClient: Job complete: job_201307122107_0001 13/07/12 21:39:05 INFO mapred.JobClient: Counters: 24 13/07/12 21:39:06 INFO mapred.JobClient: File System Counters 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of bytes read=0 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of bytes written=309768 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 21:39:06 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of bytes read=164 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of bytes written=100000 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of read operations=3 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 21:39:06 INFO mapred.JobClient: HDFS: Number of write operations=2 13/07/12 21:39:06 INFO mapred.JobClient: Job Counters 13/07/12 21:39:06 INFO mapred.JobClient: Launched map tasks=2 13/07/12 21:39:06 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=93872 13/07/12 21:39:06 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 13/07/12 21:39:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 21:39:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 21:39:06 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 21:39:06 INFO mapred.JobClient: Map input records=1000 13/07/12 21:39:06 INFO mapred.JobClient: Map output records=1000 13/07/12 21:39:06 INFO mapred.JobClient: Input split bytes=164 13/07/12 21:39:06 INFO mapred.JobClient: Spilled Records=0 13/07/12 21:39:06 INFO mapred.JobClient: CPU time spent (ms)=1360 13/07/12 21:39:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=178167808 13/07/12 21:39:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2249502720 13/07/12 21:39:06 INFO mapred.JobClient: Total committed heap usage (bytes)=48758784 13/07/12 21:39:06 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 21:39:06 INFO mapred.JobClient: BYTES_READ=1000
Check the data generated:
[root@n1 lib]# hadoop fs -ls ./terasort-input Found 4 items -rw-r--r-- 3 root supergroup 0 2013-07-12 21:38 terasort-input/_SUCCESS drwxr-xr-x - root supergroup 0 2013-07-12 21:37 terasort-input/_logs -rw-r--r-- 3 root supergroup 50000 2013-07-12 21:37 terasort-input/part-00000 -rw-r--r-- 3 root supergroup 50000 2013-07-12 21:38 terasort-input/part-00001
Run the terasort test:
[root@n1 lib]# hadoop jar hadoop-examples.jar terasort terasort-input terasort-output 13/07/12 21:53:19 INFO terasort.TeraSort: starting 13/07/12 21:53:21 INFO mapred.FileInputFormat: Total input paths to process : 2 13/07/12 21:53:21 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 13/07/12 21:53:21 INFO compress.CodecPool: Got brand-new compressor [.deflate] Making 1 from 1000 records Step size is 1000.0 13/07/12 21:53:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 21:53:26 INFO mapred.JobClient: Running job: job_201307122107_0002 13/07/12 21:53:27 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 21:53:46 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 21:53:57 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 21:54:01 INFO mapred.JobClient: Job complete: job_201307122107_0002 13/07/12 21:54:01 INFO mapred.JobClient: Counters: 33 13/07/12 21:54:01 INFO mapred.JobClient: File System Counters 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of bytes read=23088 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of bytes written=520103 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 21:54:01 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of bytes read=100230 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of bytes written=100000 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of read operations=4 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 21:54:01 INFO mapred.JobClient: HDFS: Number of write operations=1 13/07/12 21:54:01 INFO mapred.JobClient: Job Counters 13/07/12 21:54:01 INFO mapred.JobClient: Launched map tasks=2 13/07/12 21:54:01 INFO mapred.JobClient: Launched reduce tasks=1 13/07/12 21:54:01 INFO mapred.JobClient: Data-local map tasks=2 13/07/12 21:54:01 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=26310 13/07/12 21:54:01 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=8722 13/07/12 21:54:01 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 21:54:01 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 21:54:01 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 21:54:01 INFO mapred.JobClient: Map input records=1000 13/07/12 21:54:01 INFO mapred.JobClient: Map output records=1000 13/07/12 21:54:01 INFO mapred.JobClient: Map output bytes=100000 13/07/12 21:54:01 INFO mapred.JobClient: Input split bytes=230 13/07/12 21:54:01 INFO mapred.JobClient: Combine input records=0 13/07/12 21:54:01 INFO mapred.JobClient: Combine output records=0 13/07/12 21:54:01 INFO mapred.JobClient: Reduce input groups=1000 13/07/12 21:54:01 INFO mapred.JobClient: Reduce shuffle bytes=22876 13/07/12 21:54:01 INFO mapred.JobClient: Reduce input records=1000 13/07/12 21:54:01 INFO mapred.JobClient: Reduce output records=1000 13/07/12 21:54:01 INFO mapred.JobClient: Spilled Records=2000 13/07/12 21:54:01 INFO mapred.JobClient: CPU time spent (ms)=3780 13/07/12 21:54:01 INFO mapred.JobClient: Physical memory (bytes) snapshot=408850432 13/07/12 21:54:01 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1962823680 13/07/12 21:54:01 INFO mapred.JobClient: Total committed heap usage (bytes)=147070976 13/07/12 21:54:01 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 21:54:01 INFO mapred.JobClient: BYTES_READ=100000 13/07/12 21:54:01 INFO terasort.TeraSort: done
Validate job output with teravalidate:
[root@n1 lib]# hadoop jar hadoop-examples.jar teravalidate terasort-output terasort-validate 13/07/12 21:56:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 21:56:04 INFO mapred.FileInputFormat: Total input paths to process : 1 13/07/12 21:56:10 INFO mapred.JobClient: Running job: job_201307122107_0003 13/07/12 21:56:11 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 21:56:23 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 21:56:31 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 21:56:34 INFO mapred.JobClient: Job complete: job_201307122107_0003 13/07/12 21:56:34 INFO mapred.JobClient: Counters: 33 13/07/12 21:56:34 INFO mapred.JobClient: File System Counters 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of bytes read=69 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of bytes written=310607 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 21:56:34 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of bytes read=100116 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of bytes written=0 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of read operations=3 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 21:56:34 INFO mapred.JobClient: HDFS: Number of write operations=2 13/07/12 21:56:34 INFO mapred.JobClient: Job Counters 13/07/12 21:56:34 INFO mapred.JobClient: Launched map tasks=1 13/07/12 21:56:34 INFO mapred.JobClient: Launched reduce tasks=1 13/07/12 21:56:34 INFO mapred.JobClient: Data-local map tasks=1 13/07/12 21:56:34 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=14493 13/07/12 21:56:34 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=6647 13/07/12 21:56:34 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 21:56:34 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 21:56:34 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 21:56:34 INFO mapred.JobClient: Map input records=1000 13/07/12 21:56:34 INFO mapred.JobClient: Map output records=2 13/07/12 21:56:34 INFO mapred.JobClient: Map output bytes=54 13/07/12 21:56:34 INFO mapred.JobClient: Input split bytes=116 13/07/12 21:56:34 INFO mapred.JobClient: Combine input records=0 13/07/12 21:56:34 INFO mapred.JobClient: Combine output records=0 13/07/12 21:56:34 INFO mapred.JobClient: Reduce input groups=2 13/07/12 21:56:34 INFO mapred.JobClient: Reduce shuffle bytes=65 13/07/12 21:56:34 INFO mapred.JobClient: Reduce input records=2 13/07/12 21:56:34 INFO mapred.JobClient: Reduce output records=0 13/07/12 21:56:34 INFO mapred.JobClient: Spilled Records=4 13/07/12 21:56:34 INFO mapred.JobClient: CPU time spent (ms)=1640 13/07/12 21:56:34 INFO mapred.JobClient: Physical memory (bytes) snapshot=250499072 13/07/12 21:56:34 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1310330880 13/07/12 21:56:34 INFO mapred.JobClient: Total committed heap usage (bytes)=81399808 13/07/12 21:56:34 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 21:56:34 INFO mapred.JobClient: BYTES_READ=100000
Hadoop provides a very convenient way to access statistics about a job from the command line:
$ hadoop job -history all terasort-output
Also you can see the detailed result via Hadoop JobTracker web UI.
NameNode benchmark (nnbench)
NNBench is useful for load testing the NameNode hardware and configuration. It generates a lot of HDFS-related requests with normally very small "payloads" for the sole purpose of putting a high HDFS management stress on the NameNode. The benchmark can simulate requests for creating, reading, renaming and deleting files on HDFS.
The syntax of NNBench is as follows:
[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar nnbench NameNode Benchmark 0.4 Usage: nnbench <options> Options: -operation <Available operations are create_write open_read rename delete. This option is mandatory> * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations. -maps <number of maps. default is 1. This is not mandatory> -reduces <number of reduces. default is 1. This is not mandatory> -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory -blockSize <Block size in bytes. default is 1. This is not mandatory> -bytesToWrite <Bytes to write. default is 0. This is not mandatory> -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory> -numberOfFiles <number of files to create. default is 1. This is not mandatory> -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory> -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory> -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory> -help: Display the help statement
To run NameNode benchmark test with 6 mappers and 3 reducers:
[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar nnbench -operation create_write -maps 6 -reduces 3 -blockSize 1 -typesToWrite 0 -numberOfFiles 100 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s` NameNode Benchmark 0.4 13/07/12 22:13:42 INFO hdfs.NNBench: Test Inputs: 13/07/12 22:13:42 INFO hdfs.NNBench: Test Operation: create_write 13/07/12 22:13:42 INFO hdfs.NNBench: Start time: 2013-07-12 22:15:42,26 13/07/12 22:13:42 INFO hdfs.NNBench: Number of maps: 6 13/07/12 22:13:42 INFO hdfs.NNBench: Number of reduces: 3 13/07/12 22:13:42 INFO hdfs.NNBench: Block Size: 1 13/07/12 22:13:42 INFO hdfs.NNBench: Bytes to write: 0 13/07/12 22:13:42 INFO hdfs.NNBench: Bytes per checksum: 1 13/07/12 22:13:42 INFO hdfs.NNBench: Number of files: 100 13/07/12 22:13:42 INFO hdfs.NNBench: Replication factor: 3 13/07/12 22:13:42 INFO hdfs.NNBench: Base dir: /benchmarks/NNBench-n1 13/07/12 22:13:42 INFO hdfs.NNBench: Read file after open: true 13/07/12 22:13:43 INFO hdfs.NNBench: Deleting data directory 13/07/12 22:13:43 INFO hdfs.NNBench: Creating 6 control files 13/07/12 22:13:43 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 13/07/12 22:13:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 22:13:44 INFO mapred.FileInputFormat: Total input paths to process : 6 13/07/12 22:13:44 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 13/07/12 22:13:44 INFO mapred.JobClient: Running job: job_201307122107_0005 13/07/12 22:13:45 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 22:14:03 INFO mapred.JobClient: map 33% reduce 0% 13/07/12 22:14:05 INFO mapred.JobClient: map 67% reduce 0% 13/07/12 22:15:57 INFO mapred.JobClient: map 83% reduce 0% 13/07/12 22:15:58 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 22:16:07 INFO mapred.JobClient: map 100% reduce 67% 13/07/12 22:16:09 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 22:16:11 INFO mapred.JobClient: Job complete: job_201307122107_0005 13/07/12 22:16:11 INFO mapred.JobClient: Counters: 33 13/07/12 22:16:11 INFO mapred.JobClient: File System Counters 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of bytes read=359 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of bytes written=1448711 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 22:16:11 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of bytes read=1530 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of bytes written=182 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of read operations=21 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 22:16:11 INFO mapred.JobClient: HDFS: Number of write operations=4006 13/07/12 22:16:11 INFO mapred.JobClient: Job Counters 13/07/12 22:16:11 INFO mapred.JobClient: Launched map tasks=6 13/07/12 22:16:11 INFO mapred.JobClient: Launched reduce tasks=3 13/07/12 22:16:11 INFO mapred.JobClient: Data-local map tasks=6 13/07/12 22:16:11 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=498450 13/07/12 22:16:11 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=24054 13/07/12 22:16:11 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 22:16:11 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 22:16:11 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 22:16:11 INFO mapred.JobClient: Map input records=6 13/07/12 22:16:11 INFO mapred.JobClient: Map output records=44 13/07/12 22:16:11 INFO mapred.JobClient: Map output bytes=974 13/07/12 22:16:11 INFO mapred.JobClient: Input split bytes=786 13/07/12 22:16:11 INFO mapred.JobClient: Combine input records=0 13/07/12 22:16:11 INFO mapred.JobClient: Combine output records=0 13/07/12 22:16:11 INFO mapred.JobClient: Reduce input groups=8 13/07/12 22:16:11 INFO mapred.JobClient: Reduce shuffle bytes=1227 13/07/12 22:16:11 INFO mapred.JobClient: Reduce input records=44 13/07/12 22:16:11 INFO mapred.JobClient: Reduce output records=8 13/07/12 22:16:11 INFO mapred.JobClient: Spilled Records=88 13/07/12 22:16:11 INFO mapred.JobClient: CPU time spent (ms)=16050 13/07/12 22:16:11 INFO mapred.JobClient: Physical memory (bytes) snapshot=1233637376 13/07/12 22:16:11 INFO mapred.JobClient: Virtual memory (bytes) snapshot=8789716992 13/07/12 22:16:11 INFO mapred.JobClient: Total committed heap usage (bytes)=525942784 13/07/12 22:16:11 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 22:16:11 INFO mapred.JobClient: BYTES_READ=228 13/07/12 22:16:11 INFO hdfs.NNBench: -------------- NNBench -------------- : 13/07/12 22:16:11 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4 13/07/12 22:16:11 INFO hdfs.NNBench: Date & time: 2013-07-12 22:16:11,562 13/07/12 22:16:11 INFO hdfs.NNBench: 13/07/12 22:16:11 INFO hdfs.NNBench: Test Operation: create_write 13/07/12 22:16:11 INFO hdfs.NNBench: Start time: 2013-07-12 22:15:42,26 13/07/12 22:16:11 INFO hdfs.NNBench: Maps to run: 6 13/07/12 22:16:11 INFO hdfs.NNBench: Reduces to run: 3 13/07/12 22:16:11 INFO hdfs.NNBench: Block Size (bytes): 1 13/07/12 22:16:11 INFO hdfs.NNBench: Bytes to write: 0 13/07/12 22:16:11 INFO hdfs.NNBench: Bytes per checksum: 1 13/07/12 22:16:11 INFO hdfs.NNBench: Number of files: 100 13/07/12 22:16:11 INFO hdfs.NNBench: Replication factor: 3 13/07/12 22:16:11 INFO hdfs.NNBench: Successful file operations: 0 13/07/12 22:16:11 INFO hdfs.NNBench: 13/07/12 22:16:11 INFO hdfs.NNBench: # maps that missed the barrier: 0 13/07/12 22:16:11 INFO hdfs.NNBench: # exceptions: 0 13/07/12 22:16:11 INFO hdfs.NNBench: 13/07/12 22:16:11 INFO hdfs.NNBench: TPS: Create/Write/Close: 0 13/07/12 22:16:11 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.0 13/07/12 22:16:11 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 13/07/12 22:16:11 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN 13/07/12 22:16:11 INFO hdfs.NNBench: 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 0.0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: Late maps: 0 13/07/12 22:16:11 INFO hdfs.NNBench: RAW DATA: # of exceptions: 0 13/07/12 22:16:11 INFO hdfs.NNBench:
Look at the trick we did here, I use a custom output directory based on the machine's short hostname `hostname -s`. This is simple trick to ensure that one box does not accidentally write into the same output directory of another machine running nnbench at the same time.
MapReduce benchmark (mrbench)
MRBench loops a small job a number of times. As such it is a very complimentary benchmark to the "large-scale" TeraSort benchmark suite because MRBench checks whether small job runs are responsive and running efficiently on your cluster. It puts its focus on the MapReduce layer as its impact on the HDFS layer is very limited.
Default parameters of mrbench is:
-baseDir: /benchmarks/MRBench [*** see my note above ***] -numRuns: 1 -maps: 2 -reduces: 1 -inputLines: 1 -inputType: ascending
Run mrbench with default parameters:
[root@n1 lib]# hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench MRBenchmark.0.0.2 13/07/12 22:04:42 INFO mapred.MRBench: creating control file: 1 numLines, ASCENDING sortOrder 13/07/12 22:04:42 INFO mapred.MRBench: created control file: /benchmarks/MRBench/mr_input/input_-1751865361.txt 13/07/12 22:04:43 INFO mapred.MRBench: Running job 0: input=hdfs://n1.example.com:8020/benchmarks/MRBench/mr_input output=hdfs://n1.example.com:8020/benchmarks/MRBench/mr_output/output_-1484101927 13/07/12 22:04:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 22:04:44 INFO mapred.FileInputFormat: Total input paths to process : 1 13/07/12 22:04:47 INFO mapred.JobClient: Running job: job_201307122107_0004 13/07/12 22:04:49 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 22:05:41 INFO mapred.JobClient: map 50% reduce 0% 13/07/12 22:05:48 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 22:05:58 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 22:06:00 INFO mapred.JobClient: Job complete: job_201307122107_0004 13/07/12 22:06:00 INFO mapred.JobClient: Counters: 33 13/07/12 22:06:00 INFO mapred.JobClient: File System Counters 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of bytes read=27 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of bytes written=468313 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 22:06:00 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of bytes read=261 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of bytes written=3 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of read operations=5 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 22:06:00 INFO mapred.JobClient: HDFS: Number of write operations=2 13/07/12 22:06:00 INFO mapred.JobClient: Job Counters 13/07/12 22:06:00 INFO mapred.JobClient: Launched map tasks=2 13/07/12 22:06:00 INFO mapred.JobClient: Launched reduce tasks=1 13/07/12 22:06:00 INFO mapred.JobClient: Data-local map tasks=2 13/07/12 22:06:00 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=50958 13/07/12 22:06:00 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=7753 13/07/12 22:06:00 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 22:06:00 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 22:06:00 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 22:06:00 INFO mapred.JobClient: Map input records=1 13/07/12 22:06:00 INFO mapred.JobClient: Map output records=1 13/07/12 22:06:00 INFO mapred.JobClient: Map output bytes=5 13/07/12 22:06:00 INFO mapred.JobClient: Input split bytes=258 13/07/12 22:06:00 INFO mapred.JobClient: Combine input records=0 13/07/12 22:06:00 INFO mapred.JobClient: Combine output records=0 13/07/12 22:06:00 INFO mapred.JobClient: Reduce input groups=1 13/07/12 22:06:00 INFO mapred.JobClient: Reduce shuffle bytes=39 13/07/12 22:06:00 INFO mapred.JobClient: Reduce input records=1 13/07/12 22:06:00 INFO mapred.JobClient: Reduce output records=1 13/07/12 22:06:00 INFO mapred.JobClient: Spilled Records=2 13/07/12 22:06:00 INFO mapred.JobClient: CPU time spent (ms)=2920 13/07/12 22:06:00 INFO mapred.JobClient: Physical memory (bytes) snapshot=398467072 13/07/12 22:06:00 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3889000448 13/07/12 22:06:00 INFO mapred.JobClient: Total committed heap usage (bytes)=204607488 13/07/12 22:06:00 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/07/12 22:06:00 INFO mapred.JobClient: BYTES_READ=2 DataLines Maps Reduces AvgTime (milliseconds) 1 2 1 77797
This means that the average finish time of executed jobs was 78 seconds.
相关推荐
数学建模学习资料 神经网络算法 参考资料-Matlab 共26页.pptx
happybirthday2 升级版生日祝福密码0000(7).zip
本项目是一个基于SSM框架的税务门户网站实现,结合了Vue技术,旨在提供一个全面的税务信息管理平台。该项目主要功能包括税务信息查询、税务申报、税务政策浏览及用户管理等多个模块。通过这些功能,用户可以方便地查询和管理税务相关的各类信息,同时也能及时了解最新的税务政策和规定。 项目采用SSM框架,即Spring、Spring MVC和MyBatis,这三者的结合为项目提供了强大的后端支持,确保了数据的安全性和系统的稳定性。前端则采用Vue.js框架,以其高效的数据绑定和组件化开发模式,提升了用户界面的响应速度和用户体验。 开发此项目的目的不仅是为了满足计算机相关专业学生在毕业设计中的实际需求,更是为了帮助Java学习者通过实战练习,深入理解并掌握SSM框架的应用,从而在实际工作中能够更好地运用这些技术。
php7.4.33镜像7z压缩包
本项目是一个基于Java的珠宝购物网站系统,采用SSM框架进行开发,旨在为计算机相关专业学生提供一个实践平台,同时也适合Java学习者进行实战练习。项目的核心功能涵盖商品展示、用户注册登录、购物车管理、订单处理和支付系统等。通过这一系统,用户可以浏览各类珠宝商品,包括详细的商品描述、高清图片和价格信息,同时能够方便地添加商品至购物车,并进行结算和支付操作。 在技术实现方面,项目运用了Spring、Spring MVC和MyBatis三大框架,确保系统的稳定性和扩展性。Spring负责业务逻辑层,提供依赖注入和面向切面编程的支持;Spring MVC则处理Web层的请求和响应,实现MVC设计模式;MyBatis作为持久层框架,简化了数据库操作。 此外,项目采用JSP技术进行前端页面展示,结合HTML、CSS和JavaScript等技术,为用户提供友好的交互界面。
基于java的高校大学生党建系统设计与实现.docx
本项目是一个基于Python-Django框架开发的疫情数据可视化分析系统,旨在为计算机相关专业的学生提供一个实践平台,同时也适用于需要进行项目实战练习的同学。项目集成了疫情数据的收集、处理、分析和可视化功能,为用户提供了一个直观、高效的数据分析环境。 在功能方面,系统能够自动抓取最新的疫情数据,包括确诊、疑似、治愈和死亡人数等关键指标。数据处理模块则负责清洗和整理这些数据,以确保分析的准确性。分析模块采用了多种统计方法和机器学习算法,以揭示疫情的发展趋势和潜在模式。可视化模块则通过图表和地图等形式,直观地展示了分析结果,便于用户理解和分享。 项目的开发框架选择了Django,这是一个高级Python Web框架,它鼓励快速开发和清晰、务实的设计。Django的强大功能和灵活性,使得项目能够快速响应需求变化,同时保证了系统的稳定性和安全性。
果树领养计划.docx
环境说明:开发语言:Java 框架:springboot JDK版本:JDK1.8 服务器:tomcat7 数据库:mysql 5.7 数据库工具:Navicat 开发软件:eclipse/myeclipse/idea Maven包:Maven 浏览器:谷歌浏览器。 项目均可完美运行 基于Java的云平台信息安全攻防实训平台提供了以下核心功能: 1. **实训课程与项目**:平台提供了丰富多样的实训课程和项目,覆盖网络安全基础知识、漏洞挖掘与利用、渗透测试技术、安全防护策略等多个领域。 2. **在线学习模块**:学员可以通过在线学习模块观看教学视频、阅读文档资料,系统地学习信息安全知识。 3. **虚拟实验室环境**:平台提供虚拟实验室环境,学员可以在模拟的真实网络场景中进行攻防演练,包括漏洞扫描、攻击测试和防御措施的学习。 4. **教学管理功能**:教师可以创建和管理课程内容,制定教学计划,布置实训作业和考试任务。 5. **监控和统计功能**:教师可以实时了解学员的学习进度、实践操作情况和考试成绩,进行有针对性的指导和辅导。 6. **平台管理功能**:管理员负责用户管理、资源分配、系统安全维护等,确保平台稳定运行和实训环境的安全性。 7. **实时监控和评估**:系统具备实时监控和评估功能,能够及时反馈学生的操作情况和学习效果。 8. **用户认证和授权机制**:平台采用了严格的用户认证和授权机制,确保数据的安全性和保密性。 这些功能共同构建了一个功能丰富、操作便捷的实训环境,旨在提升学员的信息安全技能,为信息安全领域的发展输送专业人才。
基于GrampusFramework的轻量级单体RBAC权限管理系统
内容概要:本文档全面整理了软考(中级-软件设计师)的关键知识点,涵盖了计算复杂度、网络协议、数据结构、编程语言、数据库理论、软件测试、编译原理、设计模式、安全协议等多个方面的内容。具体涉及环路复杂度计算、SSH协议、数据字典与数据流图、对象的状态与数字签名、编程语言分类、海明码、著作权法、物理层与数据链路层设备、归纳法与演绎法、模块间耦合、能力成熟度模型集成、配置管理与风险管理、数据库关系范式、内存技术、计算机网络端口、路由协议、排序算法、中间代码、软件测试类型、编译器各阶段任务、设计模式、耦合与内聚、计算机病毒种类等。 适用人群:备考软考(中级-软件设计师)的技术人员,尤其是有一定工作经验但希望进一步提升自身技能和知识的IT从业人员。 使用场景及目标:帮助考生系统梳理考试重点,理解和掌握软件设计师应具备的专业知识和技术。适合考前复习和巩固基础知识。文档还可以作为参考资料,用于日常工作中遇到相关问题时查阅。 其他说明:本文档不仅提供了丰富的知识点,还附带了一些关键术语的定义和详细的解释,确保读者能够全面理解相关内容。建议在复习过程中结合实际案例进行练习,加深理解。
数学建模学习资料 神经网络算法 Hopfield网络 共58页.pptx
工作寻(JobHunter)是一款招聘信息整合的网站,目前固定的模板有拉勾网,中华英才网,前程无忧。工作寻可以在线通过关
本项目是基于Python实现的协同过滤音乐推荐系统,旨在为计算机相关专业学生提供一个完整的毕设实战案例。项目以协同过滤算法为核心,通过分析用户历史行为数据,为用户推荐符合其兴趣偏好的音乐。 主要功能包括用户兴趣建模、音乐推荐生成以及用户反馈机制。系统能够实时捕捉用户听歌行为,动态更新用户兴趣模型,从而更精准地推送个性化音乐推荐。同时,系统设计了友好的用户界面,使用户能够方便地获取推荐音乐,并通过反馈机制不断完善推荐算法。 在技术框架方面,项目采用了Python编程语言,借助scikit-learn等机器学习库实现协同过滤算法,并结合Flask框架搭建了Web服务,确保了系统的性能和稳定性。此项目的开发,不仅能够帮助学生深入理解协同过滤算法及音乐推荐系统的工作原理,还能提升其软件开发和项目管理能力。
微型餐饮补正备案材料通知书.docx
食品生产许可质量跟踪监督建议书.docx
基于django的音乐推荐系统.zip
如果让某人推荐Python技术书,请让他看这个列表很棒的 Python 书籍如果让某人推荐Python技术书,请让他看这个列表前言好的技术书籍可以帮助我们快速成长,大部分人新生儿或者少部分受益于经典的技术书籍。在「Python开发者」微信公号后台,我们经常能收到帮忙推荐书籍的消息。此类问题在@Python开发者微博和伯乐在线的Python小组讨论中也绝非耳熟能详。 7月3日,伯乐在线在「Python开发者」微信公号发起了一个讨论(注PC端无法看到大家的评论,需要关注微信公号后,从微信公号才可以看到),通过这个讨论话题,在评论中分享对自己有帮助的大量Python技术书籍。 (Python开发者)入门《Head First Python》+入门级+微信49票+豆瓣评分9.5推荐语**66**浅显易懂,编排的顺序特别,有大量插图、对话,感觉枯燥古心通熟易懂,大量の图片,不会觉得枯燥,是一本不错的入门书《集体智慧编程》+入门级+微信123票+豆瓣评分 9.0推荐语**Mèrçurý**以实例具体的方式来展示Python的编程技巧,受益良多《Py
基于java的博客系统设计与实现.docx
建设工程基本建设程序检查表.docx