Learn to use Storm!
Table of Contents
Getting started
Prerequisites
First, you need java
and git
installed and in your user's PATH
. Also, two of the examples in storm-starter require Python and Ruby.
Next, make sure you have the storm-starter code available on your machine. Git/GitHub beginners may want to use the following command to download the latest storm-starter code and change to the new directory that contains the downloaded code.
$ git clone git://github.com/nathanmarz/storm-starter.git && cd storm-starter
storm-starter overview
storm-starter contains a variety of examples of using Storm. If this is your first time working with Storm, check out these topologies first:
- ExclamationTopology: Basic topology written in all Java
- WordCountTopology: Basic topology that makes use of multilang by implementing one bolt in Python
- ReachTopology: Example of complex DRPC on top of Storm
After you have familiarized yourself with these topologies, take a look at the other topopologies insrc/jvm/storm/starter/ such as RollingTopWords for more advanced implementations.
If you want to learn more about how Storm works, please head over to the Storm project page.
Using storm-starter with Leiningen
Install Leiningen
The storm-starter build uses Leiningen 2.0. Install Leiningen by following the leiningen installation instructions.
Running topologies with Leiningen
To run a Java topology
$ lein deps
$ lein compile
$ java -cp $(lein classpath) storm.starter.ExclamationTopology
To run a Clojure topology:
$ lein deps
$ lein compile
$ lein run -m storm.starter.clj.word-count
Using storm-starter with Maven
Install Maven
Maven is an alternative to Leiningen. Install Maven (preferably version 3.x) by following the Maven installation instructions.
Running topologies with Maven
storm-starter contains m2-pom.xml which can be used with Maven using the -f
option. For example, to compile and run WordCountTopology
in local mode, use the command:
$ mvn -f m2-pom.xml compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.starter.WordCountTopology
Packaging storm-starter for use on a Storm cluster
You can package a jar suitable for submitting to a Storm cluster with the command:
$ mvn -f m2-pom.xml package
This will package your code and all the non-Storm dependencies into a single "uberjar" at the pathtarget/storm-starter-{version}-jar-with-dependencies.jar
.
Running unit tests
Use the following Maven command to run the unit tests that ship with storm-starter. Unfortunately lein test
does not yet run the included unit tests.
$ mvn -f m2-pom.xml test
come from storm's wiki
大家可以加我个人微信号:scccdgf
相关推荐
Storm is the most popular framework for real-time stream processing. Storm provides the fundamental... This book introduces you to Storm using real-world examples, beginning with simple Storm topologies.
Storm Applied is an example-driven guide to processing and analyzing real-time data streams. This immediately useful book starts by teaching you how to design Storm solutions the right way. Then, it ...
它存储和更新 Storm 集群的状态信息,如nimbus(Storm的主控制器)和supervisors(执行计算任务的工作节点)的位置,以及topologies(Storm应用)的状态。Zookeeper也用于故障检测和恢复,确保数据的一致性和高可用...
10. **Topologies的生命周期**:创建、提交、激活、停止和重新平衡,这些都是控制Storm拓扑运行状态的重要操作。 在压缩包中的"strom开发"文件可能包含了上述过程的详细步骤,源码示例,以及可能的配置文件和文档。...
- **storm.yaml.example**:一个配置文件模板,用于指导用户如何配置自己的Storm集群。 安装和部署Storm时,用户需要将解压后的目录复制到合适的位置,并根据自己的环境配置conf目录下的配置文件。然后,通过bin...
7. **性能调优**:讨论如何优化Storm Topologies,例如调整并行度、内存分配、批处理大小等参数,以提高处理速度和资源利用率。 8. **容错机制**:Storm的容错机制是其关键特性之一,它能确保即使在节点失败的情况...
Storm Blueprints: Patterns for ...topologies. The examples increase in complexity, introducing advanced Storm concepts as well as more sophisticated approaches to deployment and operational concerns.
Storm提供了命令行工具和API来管理Topologies的生命周期。同时,可以通过Storm UI进行监控,查看各个组件的性能指标,如处理速率、延迟等。 6. **优化与扩展** 在实际应用中,我们可能会考虑优化性能,比如使用更...
- **examples**目录:可能包含一些示例topologies,帮助用户了解如何构建和部署自己的实时计算任务。 - **docs**目录:文档和API参考,帮助开发者理解和使用Storm。 “新建文本文档.txt”可能是一个空文件或包含...
Storm集群的搭建是构建实时处理系统的第一步,后续还需要对Topologies(拓扑结构)进行定义,以实现特定的实时计算任务。Topologies由Bolts(处理组件)和Spouts(数据源)组成,通过流图的形式描述数据流的处理逻辑...
Apache Storm 是一个开源的分布式实时计算系统,它允许开发者处理...在 Linux 环境下,用户需要了解如何解压和部署这个软件包,理解 Storm 的核心组件和工作原理,以及如何构建和管理 Topologies,才能充分利用其潜力。
3. ** Topologies**:Topology是Storm中处理数据的基本单元,它定义了数据流的处理逻辑。一个Topology包含多个Bolts和Spouts,通过DAG(有向无环图)的方式连接。 4. ** Spouts**:Spouts是数据流的源,它们负责...
1. **计算拓扑(Topologies)**: 计算拓扑是Storm的核心概念,它定义了数据流的处理逻辑和组件间的连接关系。一个Topology由多个Spout和Bolt组成,它们通过Stream Groupings连接,形成数据处理的流程图。 2. **...
### Storm入门详解 #### 第一章 基础 **Storm** 是一款开源的分布式实时计算系统,它能够实现对大量连续数据流的实时处理。Storm 的架构设计使其具有高可扩展性和容错能力,非常适合处理大规模的数据流。 **Storm...
为了更好地利用 Storm,你需要学习如何编写和提交 topologies,理解它的容错机制,以及如何优化性能。同时,了解 Flume 和 Kafka 的基本操作也是必要的,因为这两个组件经常与 Storm 一起使用,构建大数据实时处理...
- 设计合理的Topologies结构,包括选择合适的Spouts和Bolts,以及它们之间的连接方式。 - 考虑到Storm的容错机制,合理设置失败重试策略。 **3. 热力图优化** - 在生成热力图时考虑数据量大小,对于大数据集采取分...
这份Apache Storm 0.8.1的API参考文档详细阐述了如何创建、提交和管理Topologies,以及如何定义Spouts、Bolts和数据流分发策略。通过深入学习和理解这些API,开发者可以有效地利用Storm构建高性能的实时数据处理应用...