Storm distinguishes between the following three main entities that are used to actually run a topology in a Storm cluster:
- Worker processes
- Executors (threads)
- Tasks
Here is a simple illustration of their relationships:
A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. A running topology consists of many such processes running on many machines within a Storm cluster.
An executor is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).
A task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads ≤ #tasks
. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.
Configuring the parallelism of a topology
Note that in Storm’s terminology "parallelism" is specifically used to describe the so-calledparallelism hint, which means the initial number of executor (threads) of a component. In this document though we use the term "parallelism" in a more general sense to describe how you can configure not only the number of executors but also the number of worker processes and the number of tasks of a Storm topology. We will specifically call out when "parallelism" is used in the normal, narrow definition of Storm.
The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them. Storm currently has the following order of precedence for configuration settings:defaults.yaml
< storm.yaml
< topology-specific configuration < internal component-specific configuration < external component-specific configuration.
Number of worker processes
- Description: How many worker processes to create for the topology across machines in the cluster.
- Configuration option: TOPOLOGY_WORKERS
- How to set in your code (examples):
Number of executors (threads)
- Description: How many executors to spawn per component.
- Configuration option: ?
- How to set in your code (examples):
- TopologyBuilder#setSpout()
- TopologyBuilder#setBolt()
- Note that as of Storm 0.8 the
parallelism_hint
parameter now specifies the initial number of executors (not tasks!) for that bolt.
Number of tasks
- Description: How many tasks to create per component.
- Configuration option: TOPOLOGY_TASKS
- How to set in your code (examples):
Here is an example code snippet to show these settings in practice:
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
.shuffleGrouping("blue-spout);
In the above code we configured Storm to run the bolt GreenBolt
with an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor.
Example of a running topology
The following illustration shows how a simple topology would look like in operation. The topology consists of three components: one spout called BlueSpout
and two bolts called GreenBolt
andYellowBolt
. The components are linked such that BlueSpout
sends its output to GreenBolt
, which in turns sends its own output to YellowBolt
.
The GreenBolt
was configured as per the code snippet above whereas BlueSpout
andYellowBolt
only set the parallelism hint (number of executors). Here is the relevant code:
Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes
topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
.shuffleGrouping("blue-spout");
topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
.shuffleGrouping("green-bolt");
StormSubmitter.submitTopology(
"mytopology",
conf,
topologyBuilder.createTopology()
);
And of course Storm comes with additional configuration settings to control the parallelism of a topology, including:
- TOPOLOGY_MAX_TASK_PARALLELISM: This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. Config#setMaxTaskParallelism().
How to change the parallelism of a running topology
A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.
You have two options to rebalance a topology:
- Use the Storm web UI to rebalance the topology.
- Use the CLI tool storm rebalance as described below.
Here is an example of using the CLI tool:
# Reconfigure the topology "mytopology" to use 5 worker processes,
# the spout "blue-spout" to use 3 executors and
# the bolt "yellow-bolt" to use 10 executors.
$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology
相关推荐
Herbert Edelsbrunner 和 John Harer 编著的《Computational Topology: An Introduction》是一本关于计算拓扑学的入门书籍,内容涵盖了图形、曲面、复形、同调、对偶性、Morse 函数、持久性、稳定性、应用以及未解决...
他们的著作《Topology Optimization: Theory, Methods, and Applications》是该领域的经典入门书籍。这本书不仅详细介绍了拓扑优化的理论基础,还提供了方法论,并展示了许多实际应用案例。 - 书中提到了与出版有关...
Algebraic topology: a first course
作者 Bert Mendelson 226 页, 目录: 1. theory of sets 2. metric spaces 3. topological spaces 4. connectedness 5. compactness
The Medieval Kingdom topology: Peer relations in kindergarten children Psychology in the Schools Volume 32, April I995 THE MEDIEVAL KINGDOM TOPOLOGY: PEER RELATIONS IN KINDERGARTEN CHILDREN ...
学习计算机科学总共需要多少数学基础?宾夕法尼亚大学计算机和信息科学系教授Jean Gallier用一本1960页书的容量解决了所有的问题。该本书涵盖了计算机科学所需的线性代数、微分和最优化理论等基础知识,包含十分详尽...
Algebra, Topology, Differential Calculus, and Optimization TheoryFor Computer Science and Machine LearningJean Gallier and Jocelyn Quaintance Department of Computer and Information ScienceUniversity ...
"topology: 合并到 ALSA git 之前的 ASoC 拓扑用户空间工具" 这个标题表明我们讨论的是一个关于音频子系统(ASoC,Advanced Linux Sound Architecture)的拓扑工具。在ALSA(Advanced Linux Sound Architecture)...
在这个名为"topology: Tarantool的拓扑提供程序"的主题中,我们将深入探讨Tarantool如何利用拓扑来管理其分布式环境。 拓扑是描述网络或系统中组件之间关系的一种抽象方式,它关注的是组件如何相互连接,而不关心...
《SUMS77 Topology, Calculus and Approximation》是由Vilmos Komornik于2017年出版的一部数学著作,它涵盖了拓扑学、微积分和近似理论等多个核心数学领域。这本书旨在深入探讨这些学科的基础概念、定理和应用,为...
gem 'topology' 然后执行: $ bundle 或者自己安装: $ gem install Topology 用法 top = Topology . new Set [ Set [ 1 ] , Set [ 2 ] ] => #<Topology sos=#>, #, #, #<Set>}>> 贡献 分叉它( ) 创建您的...
这本书是 Massey 的另一本书《Algebraic Topology: An Introduction》的前五章节和《Singular Homology Theory》的合辑。 书中首先介绍了拓扑空间的基本概念,如拓扑空间的定义、拓扑空间的分类、拓扑空间的基本...
作者 Bert Mendelson 226 页, 目录: 1. theory of sets 2. metric spaces 3. topological spaces 4. connectedness 5. compactness
《Homa:SD-WAN覆盖网络中的高效拓扑与路由管理》 这篇PPT是基于同名论文的内容制作的,虽然并非INFOCOM会议上作者使用的原始版本,但它为读者提供了一个理解该研究的概览。主要关注的是如何在软件定义广域网(SD-...
标题提到的"algebraic-topology:为我的研究创建和使用的程序"很可能是一个软件项目,用于辅助进行代数拓扑的研究。这个程序可能包含了数据结构和算法,用于计算和分析拓扑空间的特征,比如计算同调群或进行CW复形的...
Munkres的《Topology》一书是拓扑学领域的经典教材,它为初学者提供了全面且严谨的拓扑学基础。该书涵盖了从基本概念到高级主题的各种内容,包括连续性、分离性质、度量空间、紧致性、 Hausdorff空间、基与覆盖、同...
jTopo Topology基于jtopo二次封装,修复了一些bug。便于直接使用,也可以稍作修改后应用到各前端框架中。 纯前端项目,所有ajax接口保留,采用模拟数据,实际开发过程中稍作修改即可对接后端。预览预览方式: 下载该...
nsq_topology 这是一组脚本,用于生成NSQ拓扑图,如下所示: 它包含两个组成部分:nsq_data 该脚本建立了支持拓扑图的数据。 通过与nsqlookupd进行通信,然后与集群中的所有nsqds进行通信。 您可能希望在托管...