`

Setting up a Storm Cluster

 
阅读更多

https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html

This page outlines the steps for getting a Storm cluster up and running. If you're on AWS, you should check out the storm-deploy project. storm-deploy completely automates the provisioning, configuration, and installation of Storm clusters on EC2. It also sets up Ganglia for you so you can monitor CPU, disk, and network usage.

If you run into difficulties with your Storm cluster, first check for a solution is in the Troubleshooting page. Otherwise, email the mailing list.

Here's a summary of the steps for setting up a Storm cluster:

  1. Set up a Zookeeper cluster
  2. Install dependencies on Nimbus and worker machines
  3. Download and extract a Storm release to Nimbus and worker machines
  4. Fill in mandatory configurations into storm.yaml
  5. Launch daemons under supervision using "storm" script and a supervisor of your choice

Set up a Zookeeper cluster

Storm uses Zookeeper for coordinating the cluster. Zookeeper is not used for message passing, so the load Storm places on Zookeeper is quite low. Single node Zookeeper clusters should be sufficient for most cases, but if you want failover or are deploying large Storm clusters you may want larger Zookeeper clusters. Instructions for deploying Zookeeper are here.

A few notes about Zookeeper deployment:

  1. It's critical that you run Zookeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case. See here for more details.
  2. It's critical that you set up a cron to compact Zookeeper's data and transaction logs. The Zookeeper daemon does not do this on its own, and if you don't set up a cron, Zookeeper will quickly run out of disk space. See here for more details.

Install dependencies on Nimbus and worker machines

Next you need to install Storm's dependencies on Nimbus and the worker machines. These are:

  1. Java 6
  2. Python 2.6.6

These are the versions of the dependencies that have been tested with Storm. Storm may or may not work with different versions of Java and/or Python.

Download and extract a Storm release to Nimbus and worker machines

Next, download a Storm release and extract the zip file somewhere on Nimbus and each of the worker machines. The Storm releases can be downloaded from here.

Fill in mandatory configurations into storm.yaml

The Storm release contains a file at conf/storm.yaml that configures the Storm daemons. You can see the default configuration values here. storm.yaml overrides anything in defaults.yaml. There's a few configurations that are mandatory to get a working cluster:

1) storm.zookeeper.servers: This is a list of the hosts in the Zookeeper cluster for your Storm cluster. It should look something like:

storm.zookeeper.servers:
  - "111.222.333.444"
  - "555.666.777.888"

If the port that your Zookeeper cluster uses is different than the default, you should set storm.zookeeper.port as well.

2) storm.local.dir: The Nimbus and Supervisor daemons require a directory on the local disk to store small amounts of state (like jars, confs, and things like that). You should create that directory on each machine, give it proper permissions, and then fill in the directory location using this config. For example:

storm.local.dir: "/mnt/storm"

3) nimbus.host: The worker nodes need to know which machine is the master in order to download topology jars and confs. For example:

nimbus.host: "111.222.333.44"

4) supervisor.slots.ports: For each worker machine, you configure how many workers run on that machine with this config. Each worker uses a single port for receiving messages, and this setting defines which ports are open for use. If you define five ports here, then Storm will allocate up to five workers to run on this machine. If you define three ports, Storm will only run up to three. By default, this setting is configured to run 4 workers on the ports 6700, 6701, 6702, and 6703. For example:

supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703

Launch daemons under supervision using "storm" script and a supervisor of your choice

The last step is to launch all the Storm daemons. It is critical that you run each of these daemons under supervision. Storm is a fail-fast system which means the processes will halt whenever an unexpected error is encountered. Storm is designed so that it can safely halt at any point and recover correctly when the process is restarted. This is why Storm keeps no state in-process -- if Nimbus or the Supervisors restart, the running topologies are unaffected. Here's how to run the Storm daemons:

  1. Nimbus: Run the command "bin/storm nimbus" under supervision on the master machine.
  2. Supervisor: Run the command "bin/storm supervisor" under supervision on each worker machine. The supervisor daemon is responsible for starting and stopping worker processes on that machine.
  3. UI: Run the Storm UI (a site you can access from the browser that gives diagnostics on the cluster and topologies) by running the command "bin/storm ui" under supervision. The UI can be accessed by navigating your web browser to http://{nimbus host}:8080.

As you can see, running the daemons is very straightforward. The daemons will log to the logs/ directory in wherever you extracted the Storm release.

分享到:
评论

相关推荐

    漫谈大数据第四期-storm

    Twitter将Storm正式开源了,这是一个分布式的、容错的实时计算系统,它被托管在GitHub上,遵循 Eclipse Public License 1.0。...关于详细的步骤,可以阅读Storm Wiki上的《Setting up a Storm cluster》。

    1-Setting Up a Virtual Linux Network on Windows.srt

    基于虚拟机的网络搭建视频,基于虚拟机的网络搭建视频,基于虚拟机的网络搭建视频基于虚拟机的网络搭建视频基于虚拟机的网络搭建视频基于虚拟机的网络搭建视频基于虚拟机的网络搭建视频基于虚拟机的网络搭建视频基于...

    CIMCO Setting up a 4th axis machine.mkv

    CIMCO Setting up a 4th axis machine.mkv

    Real-time Analytics with Storm and Cassandra(PACKT,2015)

    The book starts off with the basics of Storm and its components along with setting up the environment for the execution of a Storm topology in local and distributed mode. Moving on, you will explore ...

    Real-time.Analytics.with.Storm.and.Cassandra.1784395498

    The book starts off with the basics of Storm and its components along with setting up the environment for the execution of a Storm topology in local and distributed mode. Moving on, you will explore ...

    01 2D Animation Tutorial - Setting Up Flash.srt

    01 2D Animation Tutorial - Setting Up Flash.srt

    High Performance Cluster ComputingArchitectures and Systems

    Introduction Setting up the Cluster Security System Monitoring System Tuning

    Setting up ARIS webMethods integration.pdf

    Setting up ARIS WebMethods Integration 文档提供了详细的指导,帮助用户设置 ARIS WebMethods 集成,并解决可能遇到的问题。 知识点: * ARIS 是一个业务流程管理和分析工具 * WebMethods 是一个集成平台 * ...

    Packt.Learning.Apache.Kafka.2nd.Edition

    Chapter 2, Setting Up a Kafka Cluster, describes the steps required to set up a single- or multi-broker Kafka cluster and shares the Kafka broker properties list. Chapter 3, Kafka Design, discusses ...

    cluster3.0 聚类使用教程

    cluster3.0 聚类使用教程 Cluster 3.0 是一种功能强大的生物信息学软件,主要用于微阵列数据的分析和可视化,特别是在基因表达数据分析中具有重要的应用。下面是使用 Cluster 3.0 实现聚类的详细教程: 安装 ...

    SIMATIC NET Industrial Ethernet Security, Setting up security in STEP 7 Professional[手册].pdf

    SIMATIC NET Industrial Ethernet Security, Setting up security in STEP 7 Professional[手册]pdf,

    GPRS+Dial-Up+Setting+v1.1

    标题 "GPRS+Dial-Up+Setting+v1.1" 提及的是一个关于GPRS拨号设置的软件或教程,版本为v1.1。GPRS(General Packet Radio Service)是一种2.5代移动通信技术,它允许在GSM网络上提供分组交换的数据服务,使得手机和...

    AFF A800 - Installing and setting up.pdf

    总体而言,这份"Installing and setting up.pdf"文档是针对NetApp AFF A800系统的全面指南,涵盖了从物理安装到网络配置的所有关键步骤,旨在确保用户能够顺利地部署和管理这一先进的存储系统。对于IT专业人士来说,...

    Setting Up OpenGL in an MFC Control

    参考提供的"Setting Up OpenGL in an MFC Control.doc"文档和"OPENGL_VCPP.pdf"资料,可以更深入地学习和理解这个过程。记住,实践是掌握OpenGL与MFC结合的关键,多做尝试,不断调试,你将能够创建出令人印象深刻的...

    Gprs dial up setting

    标题中的“Gprs dial up setting”指的是GPRS(General Packet Radio Service)拨号设置,这是一种在2G移动通信网络中实现数据传输的技术。GPRS允许移动设备通过蜂窝网络进行互联网连接,它属于分组交换技术,相对...

    Redhat7 安装 yum 亲测 图解 这是安装全量包 包括 CentOS-Base.repo配置文件

    Redhat7 安装 yum 亲测 图解 这是安装全量包 包括 CentOS-Base.repo配置文件 亲测博客地址:http://blog.csdn.net/qq_34256348/article/details/78837175

    Kubernetes Cookbook Building CloudNative Applications

    compiled over 80 recipes covering topics such as setting up a cluster, managing containerized workloads using Kubernetes API objects, using storage primitives, security configurations, and extending ...

    GPRS_Dial-Up_Setting

    GPRS(General Packet Radio Service)是一种移动数据通信技术,它是2G网络中的一种增强型服务,为用户提供比普通GSM语音通话更高的数据传输速率。在本文中,我们将深入探讨MTK(Media Tek)系列芯片组如何进行GPRS...

    setting-up-a-custom-pypi-server.zip

    标题 "setting-up-a-custom-pypi-server.zip" 暗示了我们要探讨的是如何自建一个PyPI(Python Package Index)服务器。PyPI是Python社区官方的包仓库,开发者可以在这里发布自己的Python模块供他人下载使用。然而,...

Global site tag (gtag.js) - Google Analytics