1. 只有Map任务的Map Reduce Job
File System Counters FILE: Number of bytes read=3629530 FILE: Number of bytes written=98312 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=8570654 HDFS: Number of bytes written=1404469 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=14522 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=14522 Total vcore-seconds taken by all map tasks=14522 Total megabyte-seconds taken by all map tasks=14870528 Map-Reduce Framework Map input records=7452 Map output records=7452 Input split bytes=146 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=241 CPU time spent (ms)=9750 Physical memory (bytes) snapshot=184406016 Virtual memory (bytes) snapshot=893657088 Total committed heap usage (bytes)=89653248 File Input Format Counters Bytes Read=8570508 File Output Format Counters Bytes Written=1404469
2. 既有Map又有Reduce的MapReduce Job
File System Counters FILE: Number of bytes read=879582 FILE: Number of bytes written=198227 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2729649 HDFS: Number of bytes written=265 HDFS: Number of read operations=7 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=7071 Total time spent by all reduces in occupied slots (ms)=7804 Total time spent by all map tasks (ms)=7071 Total time spent by all reduce tasks (ms)=7804 Total vcore-seconds taken by all map tasks=7071 Total vcore-seconds taken by all reduce tasks=7804 Total megabyte-seconds taken by all map tasks=7240704 Total megabyte-seconds taken by all reduce tasks=7991296 Map-Reduce Framework Map input records=20 Map output records=1 Map output bytes=167 Map output materialized bytes=182 Input split bytes=139 Combine input records=1 Combine output records=1 Reduce input groups=1 Reduce shuffle bytes=182 Reduce input records=1 Reduce output records=1 Spilled Records=2 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=122 CPU time spent (ms)=3620 Physical memory (bytes) snapshot=451244032 Virtual memory (bytes) snapshot=1823916032 Total committed heap usage (bytes)=288882688 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=2729510 File Output Format Counters Bytes Written=265
相关推荐
包org.apache.hadoop.mapreduce的Hadoop源代码分析
#### 五、深入分析MapReduce - **MapReduce章节**:MapReduce部分共14章,这部分将详细介绍MapReduce的工作原理、任务调度机制、数据流处理等方面的知识。 - **MapReduce工作流程**:MapReduce框架主要包括Map阶段...
此外,新增了新的数据类型,如LongCounter,以适应更丰富的数据处理需求。 5. **跨语言支持**:Hadoop 3.0.0加强了对多种编程语言的API支持,如Java、Python、C++等,使得开发者可以更灵活地选择开发工具。 6. **...
### Hadoop源代码分析知识点详解 #### 一、Hadoop背景与关键技术介绍 Hadoop作为一款开源的大数据处理框架,其设计灵感源自Google的一系列核心论文。这些论文详细阐述了Google构建其基础设施的方法论和技术原理,...
- **用法**: `hadoop job [GENERIC_OPTIONS] [-submit ] | [-status ] | [-counter <job-id> <group-name> <counter-name>] | [-kill ] | [-events <job-id> <from-event-#> ] | [-history [all]]` - **选项**: - `...
用法:hadoop job [GENERIC_OPTIONS] [-submit ] | [-status ] | [-counter <job-id> <group-name> <counter-name>] | [-kill ] | [-events <job-id> <from-event-#> ] | [-history [all] ] | [-list [all]] | [-...
自2009年到2014年期间,阿里巴巴的Hadoop集群服务器数量经历了快速的增长,从最初的几百台服务器增加至五千台以上。这种快速增长的趋势反映了业务需求的不断扩张和对大数据处理能力的更高要求。 #### 集群服务模式 ...
### Hadoop源代码分析知识点概览 #### 一、Hadoop背景与关键技术 - **Google核心技术**:Google凭借其先进的计算平台在业界确立了领先地位,其中主要包括以下几项关键技术: - **Google Cluster**:提供了关于...
以`MyWritable`为例,这个类实现了`Writable`接口,包含了两个成员变量`counter`和`timestamp`,并在`write()`和`readFields()`中分别处理它们的序列化和反序列化过程。 在Hadoop的IO体系中,`ObjectWritable`扮演...
- **用法**: `hadoop job [GENERIC_OPTIONS] [-submit ] | [-status ] | [-counter <job-id> <group-name> <counter-name>] | [-kill ] | [-events <job-id> <from-event-#> ] | [-history [all] ] | [-list [all]]...
#### 五、Hadoop序列化机制 Hadoop采用了自定义的序列化机制,而不是Java内置的序列化机制。这是因为内置的序列化机制效率较低,且不支持跨语言交互。因此,Hadoop在`org.apache.hadoop.io`包中定义了大量的可序列...
### Hadoop源代码分析——TaskStatus类解析 #### 一、引言 在深入探讨Hadoop框架中的`TaskStatus`类之前,我们首先简要回顾一下Hadoop的基本架构及其核心组件MapReduce的工作原理。Hadoop是一款能够高效处理大规模...
- **程序计数器(Program Counter Register)**:当前线程所执行的字节码的行号指示器。 - **虚拟机栈(Java Virtual Machine Stack)**:每个线程私有的,用于存储局部变量表、操作数栈、动态链接、方法出口等信息。 - ...
例如,自定义的`MyWritable`类会实现`write`方法,将成员变量`counter`和`timestamp`写入`DataOutput`。 通过深入分析Hadoop的源码,我们可以理解其内部工作原理,优化性能,或者开发新的功能和扩展。这对于任何...
### Hadoop源码分析知识点详解 #### 一、Hadoop及其核心技术背景 Hadoop作为一款开源的分布式计算框架,其核心思想来源于Google发布的几篇重要论文。这些论文详细阐述了Google构建其分布式计算平台的关键技术和...
### Hadoop源代码分析知识点概览 #### 一、Hadoop与Google技术栈的关系 - **背景介绍**:Google在2003年至2006年间公开了一系列论文,介绍了其内部使用的基础设施和技术栈,包括分布式文件系统(GFS)、分布式锁服务...
例如,一个简单的Writable对象可能包含int类型的counter和long类型的timestamp,通过write方法将数据写入DataOutput流。 总的来说,Hadoop源码分析涵盖了分布式系统设计、并行计算模型、网络通信、数据序列化等多个...
首先,Hadoop的诞生源于Google的五篇核心技术论文,即GoogleCluster、Chubby、GFS、BigTable和MapReduce。这些创新性的技术为Google构建了强大的计算平台。随后,Apache社区推出了Hadoop项目,将Google的技术理念...