Running topologies on a production cluster is similar to running in Local mode. Here are the steps:
1) Define the topology (Use TopologyBuilder if defining using Java)
2) Use StormSubmitter to submit the topology to the cluster. StormSubmitter
takes as input the name of the topology, a configuration for the topology, and the topology itself. For example:
Config conf = new Config();
conf.setNumWorkers(20);
conf.setMaxSpoutPending(5000);
StormSubmitter.submitTopology("mytopology", conf, topology);
3) Create a jar containing your code and all the dependencies of your code (except for Storm -- the Storm jars will be added to the classpath on the worker nodes).
If you're using Maven, the Maven Assembly Plugin can do the packaging for you. Just add this to your pom.xml:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.path.to.main.Class</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
Then run mvn assembly:assembly to get an appropriately packaged jar. Make sure you exclude the Storm jars since the cluster already has Storm on the classpath.
4) Submit the topology to the cluster using the storm
client, specifying the path to your jar, the classname to run, and any arguments it will use:
storm jar path/to/allmycode.jar org.me.MyTopology arg1 arg2 arg3
storm jar
will submit the jar to the cluster and configure the StormSubmitter
class to talk to the right cluster. In this example, after uploading the jar storm jar
calls the main function on org.me.MyTopology
with the arguments "arg1", "arg2", and "arg3".
You can find out how to configure your storm
client to talk to a Storm cluster on Setting up development environment.
Common configurations
There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found here. The ones prefixed with "TOPOLOGY" can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology:
- Config.TOPOLOGY_WORKERS: This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components in the topology, each worker process will have 6 tasks running within it as threads.
- Config.TOPOLOGY_ACKERS: This sets the number of tasks that will track tuple trees and detect when a spout tuple has been fully processed. Ackers are an integral part of Storm's reliability model and you can read more about them on Guaranteeing message processing.
- Config.TOPOLOGY_MAX_SPOUT_PENDING: This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion.
- Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS: This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies. See Guaranteeing message processingfor more information on how Storm's reliability model works.
- Config.TOPOLOGY_SERIALIZATIONS: You can register more serializers to Storm using this config so that you can use custom types within tuples.
Killing a topology
To kill a topology, simply run:
storm kill {stormname}
Give the same name to storm kill
as you used when submitting the topology.
Storm won't kill the topology immediately. Instead, it deactivates all the spouts so that they don't emit any more tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. This gives the topology enough time to complete any tuples it was processing when it got killed.
Updating a running topology
To update a running topology, the only option currently is to kill the current topology and resubmit a new one. A planned feature is to implement a storm swap
command that swaps a running topology with a new one, ensuring minimal downtime and no chance of both topologies processing tuples at the same time.
Monitoring topologies
The best place to monitor a topology is using the Storm UI. The Storm UI provides information about errors happening in tasks and fine-grained stats on the throughput and latency performance of each component of each running topology.
You can also look at the worker logs on the cluster machines.
come from github
相关推荐
Techniques to troubleshoot common issues that you may face during and after deploying availability groups in a mission-critical production environment. What You Will Learn Grasp important concepts ...
本书《Topologies and Uniformities, Ioan Mackenzie James (1999)》属于Springer Undergraduate Mathematics Series(SUMS)系列,是由Ioan Mackenzie James撰写的,他是牛津大学数学研究所的MA、DPhil、FRS成员。...
"On-Chip Communication Architectures" is a comprehensive resource that delves into the intricacies of interconnect designs for System-on-Chip (SoC) platforms. Authored by Sudeep Pasricha and Nikil ...
influence of information flow topology on the closed-loop stability of homogeneous vehicular platoon moving in a rigid formation. A linearized vehicle longitudinal dynamic model is derived using the ...
Then, it quickly dives into real-world case studies that show you how to scale a high-throughput stream processor, ensure smooth operation within a production cluster, and more. Along the way, you'll...
algorithms have been evaluated on a variety of benchmark and industrial circuits and synchronous performance improvements of well above 60% have been demonstrated. • For those cases where reliable ...
Learn a formal, high availability methodology for understanding and selecting the right HA solution for your needs Deep dive into Microsoft Cluster Services Use selective data replication topologies ...
### 不同连接拓扑在小世界网络中对类似EEG活动的影响 #### 概述 本文探讨了在小世界网络的不同连接拓扑下,类似脑电图(EEG)活动的变化,并通过赫斯特指数(Hurst exponent)、关联维数(correlation dimension)...
幂律网络拓扑的可视化研究由David Soen-Mun Chan、Khim Shiong Chua、Christopher Leckie和Ajeet Parhar等人进行,他们提出的ODL算法是一种新的图布局算法,旨在可视化大型网络拓扑结构。该算法的主要贡献是通过将...
拓扑优化是一种数学方法,用于在给定的材料和设计空间中寻找最优的材料分布,以便在满足性能要求和设计约束的前提下,实现某种性能的最优化,如最小化质量、最大化刚度或实现其他工程目标。拓扑优化理论在结构设计中...
《初学者指南:滤波器拓扑》是针对电子工程领域初学者的一份宝贵资料,主要探讨了滤波器设计的基础知识,特别是滤波器的拓扑结构。这份压缩包包含了一份PDF文档,全面介绍了滤波器在信号处理中的重要性、基本类型...
### FPGA互联拓扑探索:深度解析与优化策略 在当今高度复杂的数字系统设计领域,现场可编程门阵列(FPGA)作为一种灵活的硬件平台,因其可重构性和高性能而受到广泛青睐。FPGA的设计核心在于其互联网络,即用于连接...
various components of a Storm cluster. In the later chapters, you will learn how to build a Storm application that can interact with various other Big Data technologies and how to create ...
源码阅读之storm操作zookeeper-cluster.clj是深入理解storm框架如何与Zookeeper协同工作的关键。这篇文章主要聚焦于storm在Clojure语言环境下如何通过cluster.clj文件与Zookeeper集群进行交互,提供了对相关源码的...
vSAN Network Topologies 5.1.vSAN Network Topologies 5.2.Standard Deployments 5.3.Stretched Cluster Deployments 5.4.2 Node vSAN Deployments 5.5.2 Node vSAN Deployments – Common Config Questions 2 ...
3 Networks on Chips: A New Paradigm for Component-Based MPSoC Design 49 Luca Benini and Giovanni De Micheli 3.1 Introduction 49 3.1.1 Technology Trends 49 3.1.2 Nondeterminism in SoC Abstraction ...
本文献《Generating Realistic ISP-Level Network Topologies》关注了网络研究中的一个关键问题:如何生成与现实世界相似的网络拓扑结构。在进行网络模拟时,所选的拓扑结构对模拟结果有着显著的影响,因此构建出...
标题《Cluster consensus of high-order multi-agent systems with switching topologies》所涉及的知识点主要集中在高阶多智能体系统、聚类一致性以及拓扑切换三个方面。首先,我们需要明确“高阶多智能体系统”这...
It is based on applying three key techniques: tracking genes with history markers to allow crossover among topologies, applying speciation (the evolution of species) to preserve innovations, and ...