`

Tez: 5 Dynamic Graph Reconfiguration

    博客分类:
  • Tez
 
阅读更多

Case Study: Automatic Reduce Parallelism

Motivation

tez1Distributed data processing is dynamic by nature and it is extremely difficult to statically determine optimal concurrency and data movement methods a priori. More information is available during runtime, like data samples and sizes, which may help optimize the execution plan further. We also recognize that Tez by itself cannot always have the smarts to perform these dynamic optimizations. The design of Tez includes support for pluggable vertex management modules to collect relevant information from tasks and change the dataflow graph at runtime to optimize for performance and resource usage. The diagram shows how we can determine an appropriate number of reducers in a MapReduce like job by observing the actual data output produced and the desired load per reduce task.

Performance & Efficiency via Dynamic Graph Reconfiguration

Tez envisions running computation by the most resource efficient and high-performance means possible given the runtime conditions in the cluster and the results of the previous steps of the computation. This functionality is constructed using a couple of basic building blocks

  • Pluggable Vertex Management Modules: The control flow architecture of Tez incorporates a per-vertex pluggable module for user logic that deeply understands the data and computation. The vertex state machine invokes this user module at significant transitions of the state machine such as vertex start, source task completion etc. At these points the user logic can examine the runtime state and provide hints to the main Tez execution engine on attributes like vertex task parallelism.
  • Event Flow Architecture: Tez defines a set of events by which different components like vertices, tasks etc. can pass information to each other. These events are routed from source to destination components based on a well-defined routing logic in the Tez control plane. One such event is the VertexManager event that can be used to send any kind of user-defined payload to the VertexManager of a given vertex.

Case Study: Reduce task parallelism and Reduce Slow-start

Determining the correct number of reduce tasks has been a long standing issue for Map Reduce jobs. The output produced by the map tasks is not known a priori and thus determining that number before job execution is hard. This becomes even more difficult when there are several stages of computation and the reduce parallelism needs to be determined for each stage. We take that as a case study to demonstrate the graph reconfiguration capabilities of Tez.

tez2Reduce Task Parallelism: Tez has a ShuffleVertexManager that understands the semantics of hash based partitioning performed over a shuffle transport layer that is used in MapReduce. Tez defines a VertexManager event that can be used to send an arbitrary user payload to the vertex manager of a given vertex. The partitioning tasks (say the Map tasks) use this event to send statistics such as the size of the output partitions produced to the ShuffleVertexManager for the reduce vertex. The manager receives these events and tries to model the final output statistics that would be produced by the all the tasks. It can then advise the vertex state machine of the Reduce vertex to decrease the parallelism of the vertex if needed. The idea being to first over-partition and then determine the correct number at runtime. The vertex controller can cancel extra tasks and proceed as usual.


tez3Reduce Slow-start/Pre-launch: 
Slow-start is a MapReduce feature where-in the reduce tasks are launched before all the map tasks complete. The hypothesis being that reduce tasks can start fetching the completed map outputs while the remaining map tasks complete. Determining when to pre-launch the reduce tasks is tricky because it depends on output data produced by the map tasks. It would be inefficient to run reduce tasks so early that they finish fetching the data and sit idle while the remaining maps are still running. In Tez, the slow-start logic is embedded in the ShuffleVertexManager. The vertex state controller informs the manager whenever a source task (here the Map task) completes. The manager uses this information to determine when to pre-launch the reduce tasks and how many to pre-launch. It then advises the vertex controller.

 

Its easy to see how the above can be extended to determine the correct parallelism for range-partitioning scenarios. The data samples could be sent via the VertexManager events to the vertex manager that can create the key-range histogram and determine the correct number of partitions. It can then assign the appropriate key-ranges to each partition. Thus, in Tez, this operation could be achieved without the overhead of a separate sampling job.

 

 

orginal doc: http://hortonworks.com/blog/apache-tez-dynamic-graph-reconfiguration/

分享到:
评论

相关推荐

    TEZ:训练pytorch模型更快rrrr......。-Python开发

    tez:训练pytorch模型fastrrrr ....... tez:训练pytorch模型fastrrrr .......注意:当前,我们不接受任何拉取请求! 所有公共关系将被关闭。 如果您需要某个功能或某些功能不起作用,请创建一个问题。 意思是“锐利...

    tez:Apache Tez

    阿帕奇·特兹(Apache Tez) Apache Tez是一个通用的数据处理管道引擎,被设想为用于更高抽象的低级引擎,例如Apache Hadoop Map-Reduce,Apache Pig,Apache Hive等。 从本质上讲,tez非常简单,只有两个组成部分...

    tez:Tez是用于PyTorch的超级简单且轻巧的Trainer。 它还带有许多实用程序,可用于解决PyTorch中90%以上的深度学习项目

    Tez:简单的pytorch培训师 注意:当前,我们不接受任何拉取请求! 所有公共关系将被关闭。 如果您需要某个功能或某些功能不起作用,请创建一个问题。 意思是“锐利,快速,活跃”。 这是一个简单的要点库,使您的...

    storm-tez:使用TEZ在纱线POC上进行风暴

    5. **storm-tez-master**:这个压缩包文件名可能表示的是项目的核心源代码库,包含了构建、配置和运行Storm-on-Tez所需的所有组件。用户可以通过解压此文件来查看和编译源代码,然后在本地或集群环境中部署和测试这...

    docker-hive-on-tez:在 Tez 上运行的 Apache Hive 的 Docker 镜像

    在 Tez 上运行 Apache Hive 的 Docker 镜像此存储库包含一个 docker 文件,用于构建 docker 映像以在 Tez 上运行 Apache Hive。 这个 docker 文件依赖于我的其他包含和 基础镜像的存储库。当前版本Apache Hive(主干...

    audioholic.tez:建立在Tezos区块链上的音乐流媒体购买平台

    audioholic.tez 建立在Tezos区块链上的音乐流/购买平台。

    cdh继承tez引擎 cdh添加tez引擎 hive引擎增加

    5. **修改Tez的pom.xml**: 在`pom.xml`文件中,你需要找到并修改`<hadoop.version>`标签,使其与你的CDH版本匹配。此外,还需要添加两个新的仓库配置: - 第二个仓库是CDH的中央仓库,用于下载CDH特有的依赖。 -...

    运行引擎Tez.zip

    Tez的核心是一个Task Execution Graph (TEG),它是一个有向无环图,其中每个顶点代表一个task,边表示task间的依赖关系。在执行过程中,Tez会将任务分解为更小的单位——Vertex和Edge,Vertex执行具体的计算,Edge...

    阿托斯 (ATOS)数字式位置控制器Z-RI-TEZ.pdf

    阿托斯 (ATOS)数字式位置控制器Z-RI-TEZpdf,阿托斯 (ATOS)数字式位置控制器Z-RI-TEZ:数字式,与阀集成,适用于轴运动控制

    CDH6.3.2集成tez

    ### CDH6.3.2集成tez #### 概述 Cloudera Distribution Including Apache Hadoop (CDH) 是一款由 Cloudera 公司提供的企业级大数据平台,它包含了 Hadoop 生态系统中的核心组件和服务。Tez 是一个支持复杂数据处理...

    tez-ui-0.10.1.war

    5. **集成Hive**:为了在Hive中利用Tez UI,还需要在Hive的配置文件(如`hive-site.xml`)中设置`tez.ui.history.url.base`属性,指向Tez UI的URL。 6. **测试与验证**:运行一个Hive查询并检查Tez UI是否能正确...

    tez-0.10.1.zip

    1. **DAG(有向无环图)**: Tez 的工作流程是以 DAG(Directed Acyclic Graph)的形式表示的,其中每个节点代表一个任务,边则表示任务间的依赖关系。这种设计允许数据在网络中的高效流动,并且可以并行执行无依赖的...

    Apache TEZ部署手册

    1. 解压 tar 包 `tar –zxvf ./tez/tez-dist/target/tez-0.7.0.tar.gz` 和 `tar –zxvf ./tez/tez-dist/target/tez-0.7.0-minimal.tar.gz`。 2. 将 tez-0.7.0 上传到 HDFS `hadoop fs –mkdir /apps` 和 `hadoop fs ...

    apache-tez-0.9.2-bin.tar.gz

    5. **测试安装**:安装完成后,运行`tez versions`命令检查Tez是否已正确安装并找到相应的版本信息。 6. **运行示例**:可以尝试运行Tez自带的示例,如WordCount,以验证安装是否成功。 Tez的性能提升主要体现在...

    tez-0.9.1.tar.gz

    5. `tez-runtime-internals-0.9.1.jar`:包含Tez内部运行时组件,如Vertex、Task和Container的实现。 6. `tez-tests-0.9.1.jar`:包含了Tez的测试代码,用于验证框架的功能和性能。 7. `tez-common-0.9.1.jar`:...

    tez.tar.gz

    5. **tez-job-analyzer-0.8.5.jar** - 工具,用于分析Tez作业的性能和优化建议。 6. **tez-runtime-library-0.8.5.jar**、**tez-runtime-internals-0.8.5.jar** - 运行时库和内部组件,实现Tez的任务执行逻辑。 在...

    tez-0.10.1-SNAPSHOT-minimal.tar.gz

    5. **tez-mapreduce-0.10.1-SNAPSHOT.jar**:这个模块与Hadoop MapReduce集成,允许Tez在MapReduce之上运行。 6. **tez-runtime-internals-0.10.1-SNAPSHOT.jar**:内部运行时组件,包括任务通信、数据传输等低层次...

    hive on tez 常见报错问题收集

    然而,在实际操作中,我们经常会遇到一些常见的错误,特别是当Hive运行在Tez引擎上时。这里我们将深入探讨五个在Hive on Tez中常见的报错问题及其解决方案。 1. 错误一:Failing because I am unlikely to write ...

    tez-0.9.2-minimal.tar.gz

    5. **tez-mapreduce-0.9.2.jar**:这是TEZ与MapReduce集成的模块,使得在TEZ中可以无缝使用MapReduce作业。 6. **tez-runtime-internals-0.9.2.jar**:包含了TEZ运行时内部实现的类库,用于处理任务之间的通信和...

    apache-tez-0.8.5-bin.tar.gz

    5. **日志与调试**:Tez的日志系统对于调试和性能分析非常重要,理解如何查看和分析Tez的任务和容器日志有助于定位问题。 6. **资源管理**:由于Tez运行在YARN之上,因此需要了解如何配置YARN以最佳地利用集群资源...

Global site tag (gtag.js) - Google Analytics