【Hadoop五】Word Count实例结果分析


如下是运行Word Count的结果,输入了两个小文件,从大小在几K之间。


hadoop@hadoop-Inspiron-3521:~/hadoop-2.5.2/bin$ hadoop jar WordCountMapReduce.jar /users/hadoop/hello/world /users/hadoop/output5
14/12/15 22:35:40 INFO client.RMProxy: Connecting to ResourceManager at /
14/12/15 22:35:41 INFO input.FileInputFormat: Total input paths to process : 2 //一共有两个文件要处理
14/12/15 22:35:41 INFO mapreduce.JobSubmitter: number of splits:2  //两个input splits,每个split对应一个Map Task
14/12/15 22:35:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1418652929537_0001
14/12/15 22:35:43 INFO impl.YarnClientImpl: Submitted application application_1418652929537_0001
14/12/15 22:35:43 INFO mapreduce.Job: The url to track the job: http://hadoop-Inspiron-3521:8088/proxy/application_1418652929537_0001/
14/12/15 22:35:43 INFO mapreduce.Job: Running job: job_1418652929537_0001
14/12/15 22:35:54 INFO mapreduce.Job: Job job_1418652929537_0001 running in uber mode : false
14/12/15 22:35:54 INFO mapreduce.Job:  map 0% reduce 0%
14/12/15 22:36:04 INFO mapreduce.Job:  map 50% reduce 0%
14/12/15 22:36:05 INFO mapreduce.Job:  map 100% reduce 0%
14/12/15 22:36:16 INFO mapreduce.Job:  map 100% reduce 100%
14/12/15 22:36:17 INFO mapreduce.Job: Job job_1418652929537_0001 completed successfully
14/12/15 22:36:17 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=3448
		FILE: Number of bytes written=299665
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2574
		HDFS: Number of bytes written=1478
		HDFS: Number of read operations=9
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=2 //一个输入文件一个Map Task
		Launched reduce tasks=1
		Data-local map tasks=2 //两个Map Task都是从本地Node读取数据内容
		Total time spent by all maps in occupied slots (ms)=17425
		Total time spent by all reduces in occupied slots (ms)=8472
		Total time spent by all map tasks (ms)=17425
		Total time spent by all reduce tasks (ms)=8472
		Total vcore-seconds taken by all map tasks=17425
		Total vcore-seconds taken by all reduce tasks=8472
		Total megabyte-seconds taken by all map tasks=17843200
		Total megabyte-seconds taken by all reduce tasks=8675328
	Map-Reduce Framework
		Map input records=90 //输入的两个文件的一共90行
		Map output records=251 //Map输出了251行,也就是说一行有将近3个单词,251/90
		Map output bytes=2940
		Map output materialized bytes=3454
		Input split bytes=263
		Combine input records=0
		Combine output records=0
		Reduce input groups=138
		Reduce shuffle bytes=3454
		Reduce input records=251
		Reduce output records=138
		Spilled Records=502
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=274
		CPU time spent (ms)=3740
		Physical memory (bytes) snapshot=694566912
		Virtual memory (bytes) snapshot=3079643136
		Total committed heap usage (bytes)=513277952
	Shuffle Errors
	File Input Format Counters 
		Bytes Read=2311   //两个文件的总大小
	File Output Format Counters 
		Bytes Written=1478 //输出文件part-r-00000文件的大小
  <description>The default block size for new files.</description>

  <description>The minimum of block size</description>
 Pro Apache Hadoop(p13)
A map task can run on any compute node in the cluster, and multiple map tasks can run in parallel across the cluster. The map task is responsible for transforming the input records into key/value pairs. The output of all the maps will be partitioned, and each partition will be sorted. There will be one partition for each reduce task. Each partition’s sorted keys and the values associated with the keys are then processed by the reduce task. There can be multiple reduce tasks running in parallel on the cluster.
The key to receiving a list of values for a key in the reduce phase is a phase known as the sort/shuffle phase in MapReduce. All the key/value pairs emitted by the Mapper are sorted by the key in the Reducer. If multiple Reducers are allocated, a subset of keys will be allocated to each Reducer. The key/value pairs for a given Reducer are sorted by key, which ensures that all the values associated with one key are received by the Reducer together.
 Word Count Map Reduce过程

Some of the metadata stored by the NameNode includes these:
• File/directory name and its location relative to the parent directory.
• File and directory ownership and permissions.
• File name of individual blocks. Each block is stored as a file in the local file system of the DataNode in the directory that can be configured by the Hadoop system administrator.
The NameNode file that contains the metadata is fsimage. Any changes to the metadata during the system
operation are stored in memory and persisted to another file called edits. Periodically, the edits file is merged with the fsimage file by the Secondary NameNode.
hdfs  fsck / -files -blocks -locations |grep /users/hadoop/wordcount -A 30
  • 大小: 201.6 KB


