Task Attempt
Table of contents:
- Finite State Machine
- NEW => UNASSIGNED [TA_SCHEDULE]
- UNASSIGNED => ASSIGNED [TA_ASSIGNED]
- ASSIGNED => RUNNING [TA_CONTAINER_LAUNCHED]
- RUNNING => SUCCESS_CONTAINER_CLEANUP [TA_DONE], COMMIT_PENDING => SUCCESS_CONTAINER_CLEANUP[TA_DONE]
- SUCCESS_CONTAINER_CLEANUP => SUCCEEDEED [TA_CONTAINER_CLEANED]
Finite State Machine
NEW => UNASSIGNED [TA_SCHEDULE]
UNASSIGNED => ASSIGNED [TA_ASSIGNED]
ASSIGNED => RUNNING [TA_CONTAINER_LAUNCHED]
RUNNING => SUCCESS_CONTAINER_CLEANUP [TA_DONE], COMMIT_PENDING => SUCCESS_CONTAINER_CLEANUP [TA_DONE]
SUCCESS_CONTAINER_CLEANUP => SUCCEEDEED [TA_CONTAINER_CLEANED]
MapTask
Table of contents:
Startup
Execution
Post Execution - Shuffle
ReduceTask
Table of contents:
Startup
Shuffle
Local Fetcher
Fetcher
Shuffle - Merge
Execution
Merger
相关推荐
图2展示了HCE框架下的用户作业执行流程。具体的数据通路如下: 1. **Input Splitting**:输入数据的切分仍由Java的InputFormat组件完成,它根据输入数据的特性将其分成若干个小块,这些小块将被分配给不同的Map任务...
9. MR机制:MapReduce中的MapTask和ReduceTask分别负责数据的映射和归约操作,shuffle阶段负责排序、合并和传输map输出到reduce。 10. Yarn架构与工作原理:Yarn是一个资源管理平台,负责资源管理和任务调度,包括...
- **WordCountDataflows算子链**: 包括 Split、Map、Reduce 等算子的组合使用。 **5.4 Flink任务调度规则** - **Flink 并行度设置方式**: 设置并行度来控制任务的执行规模。 - **DataFlows DataSource数据源**: -...
Flink的作业执行流程大致可以分为几个步骤:程序提交、作业图构建、任务调度、数据处理和结果输出。 Spark如何保证宕机迅速恢复: Spark通过RDD的不变性和血统(lineage)来保证即使在宕机的情况下也能快速恢复计算...