`
roadrunners
  • 浏览: 77186 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Storm0.9.4的集群部署

阅读更多

Twitter开源的Storm是一个分布式的、可靠的、容错的实时计算系统。Storm的其它概念和功能特点此处就不再赘述,这里主要讲Storm如何在集群中正确配置安装。在我之前听说Storm的时候,Storm的网络传输还是通过ZeroMQ来实现的,当前版本已经支持Netty Transport传输了,而且据说后者的性能要比前者高一倍,我们果断选择后者。好了言归正传,下面进入部署环节。

 

1、准备工作

我们准备3台机器做Storm集群,分别在3台机器上创建Storm安装需要的目录。

 

数据存储目录:

mkdir -p /opt/data/storm

 

日志目录:

mkdir -p /opt/logs/storm

 

Storm安装包下载:

wget http://mirror.bit.edu.cn/apache/storm/apache-storm-0.9.4/apache-storm-0.9.4.tar.gz

 

JDK的安装见:Linux环境下安装JDK

 

配置hosts,同时修改hostname:

vim /etc/hosts

10.100.152.4 storm1.com
10.100.152.5 storm2.com
10.100.152.6 storm3.com

vim /etc/hostsname

storm1.com 

 

安装python:

首先确认系统是否自带了python,如果自带并版本在2.6.6或以上的话就不需要安装python。

python -V

Python 2.6.6

系统当前自带的版本可以不用安装,否则就要使用yum install python进行安装。

 

ZooKeeper安装见:ZooKeeper的集群部署

 

2、Storm安装

在Storm安装之前,我们先来看一下Storm的默认配置信息(defaults.yaml)。

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


########### These all have default values as shown
########### Additional configuration goes into storm.yaml

java.library.path: "/usr/local/lib:/opt/local/lib:/usr/lib"

### storm.* configs are general configurations
# the local dir is where jars are kept
storm.local.dir: "storm-local"
storm.zookeeper.servers:
    - "localhost"
storm.zookeeper.port: 2181
storm.zookeeper.root: "/storm"
storm.zookeeper.session.timeout: 20000
storm.zookeeper.connection.timeout: 15000
storm.zookeeper.retry.times: 5
storm.zookeeper.retry.interval: 1000
storm.zookeeper.retry.intervalceiling.millis: 30000
storm.cluster.mode: "distributed" # can be distributed or local
storm.local.mode.zmq: false
storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin"
storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.meta.serialization.delegate: "backtype.storm.serialization.DefaultSerializationDelegate"

### nimbus.* configs are for the master
nimbus.host: "localhost"
nimbus.thrift.port: 6627
nimbus.thrift.max_buffer_size: 1048576
nimbus.childopts: "-Xmx1024m"
nimbus.task.timeout.secs: 30
nimbus.supervisor.timeout.secs: 60
nimbus.monitor.freq.secs: 10
nimbus.cleanup.inbox.freq.secs: 600
nimbus.inbox.jar.expiration.secs: 3600
nimbus.task.launch.secs: 120
nimbus.reassign: true
nimbus.file.copy.expiration.secs: 600
nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator"

### ui.* configs are for the master
ui.port: 8080
ui.childopts: "-Xmx768m"

logviewer.port: 8000
logviewer.childopts: "-Xmx128m"
logviewer.appender.name: "A1"


drpc.port: 3772
drpc.worker.threads: 64
drpc.queue.size: 128
drpc.invocations.port: 3773
drpc.request.timeout.secs: 600
drpc.childopts: "-Xmx768m"

transactional.zookeeper.root: "/transactional"
transactional.zookeeper.servers: null
transactional.zookeeper.port: null

### supervisor.* configs are for node supervisors
# Define the amount of workers that can be run on this machine. Each worker is assigned a port to use for communication
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703
supervisor.childopts: "-Xmx256m"
#how long supervisor will wait to ensure that a worker process is started
supervisor.worker.start.timeout.secs: 120
#how long between heartbeats until supervisor considers that worker dead and tries to restart it
supervisor.worker.timeout.secs: 30
#how frequently the supervisor checks on the status of the processes it's monitoring and restarts if necessary
supervisor.monitor.frequency.secs: 3
#how frequently the supervisor heartbeats to the cluster state (for nimbus)
supervisor.heartbeat.frequency.secs: 5
supervisor.enable: true

### worker.* configs are for task workers
worker.childopts: "-Xmx768m"
worker.heartbeat.frequency.secs: 1

# control how many worker receiver threads we need per worker
topology.worker.receiver.thread.count: 1

task.heartbeat.frequency.secs: 3
task.refresh.poll.secs: 10

zmq.threads: 1
zmq.linger.millis: 5000
zmq.hwm: 0


storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880 #5MB buffer
# Since nimbus.task.launch.secs and supervisor.worker.start.timeout.secs are 120, other workers should also wait at least that long before giving up on connecting to the other worker. The reconnection period need also be bigger than storm.zookeeper.session.timeout(default is 20s), so that we can abort the reconnection when the target worker is dead.
storm.messaging.netty.max_retries: 300
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100

# If the Netty messaging layer is busy(netty internal buffer not writable), the Netty client will try to batch message as more as possible up to the size of storm.messaging.netty.transfer.batch.size bytes, otherwise it will try to flush message as soon as possible to reduce latency.
storm.messaging.netty.transfer.batch.size: 262144

# We check with this interval that whether the Netty channel is writable and try to write pending messages if it is.
storm.messaging.netty.flush.check.interval.ms: 10

### topology.* configs are for specific executing storms
topology.enable.message.timeouts: true
topology.debug: false
topology.workers: 1
topology.acker.executors: null
topology.tasks: null
# maximum amount of time a message has to complete before it's considered failed
topology.message.timeout.secs: 30
topology.multilang.serializer: "backtype.storm.multilang.JsonSerializer"
topology.skip.missing.kryo.registrations: false
topology.max.task.parallelism: null
topology.max.spout.pending: null
topology.state.synchronization.timeout.secs: 60
topology.stats.sample.rate: 0.05
topology.builtin.metrics.bucket.size.secs: 60
topology.fall.back.on.java.serialization: true
topology.worker.childopts: null
topology.executor.receive.buffer.size: 1024 #batched
topology.executor.send.buffer.size: 1024 #individual messages
topology.receiver.buffer.size: 8 # setting it too high causes a lot of problems (heartbeat thread gets starved, throughput plummets)
topology.transfer.buffer.size: 1024 # batched
topology.tick.tuple.freq.secs: null
topology.worker.shared.thread.pool.size: 4
topology.disruptor.wait.strategy: "com.lmax.disruptor.BlockingWaitStrategy"
topology.spout.wait.strategy: "backtype.storm.spout.SleepSpoutWaitStrategy"
topology.sleep.spout.wait.strategy.time.ms: 1
topology.error.throttle.interval.secs: 10
topology.max.error.report.per.interval: 5
topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory"
topology.tuple.serializer: "backtype.storm.serialization.types.ListDelegateSerializer"
topology.trident.batch.emit.interval.millis: 500
topology.classpath: null
topology.environment: null

dev.zookeeper.path: "/tmp/dev-storm-zookeeper"
  

通过上面的defaults.yaml配置可以看出来,很多配置项都可以使用默认的。下面进行Storm的安装配置。

 

首先解压Storm安装包:tar -zxvf apache-storm-0.9.4.tar.gz

修改配置项:

vim storm.yaml

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

########### These MUST be filled in for a storm configuration

#ZooKeeper的地址配置有两种方式,一种是使用一个虚拟IP,通过Nginx代理到集群的机器上,这种方式客户端使用方便,只需要配置一个地址即可;另外一种是把集群的所有地址都配置起。
 storm.zookeeper.servers:
     - "10.100.15.1"
#     - "server2"
#
 storm.zookeeper.port: 8900
 nimbus.host: "storm1.com"
 storm.log.dir: "/opt/logs/storm"
 
#
#
# ##### These may optionally be filled in:
#   
## List of custom serializations
# topology.kryo.register:
#     - org.mycompany.MyType
#     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
## List of custom kryo decorators
# topology.kryo.decorators:
#     - org.mycompany.MyDecorator
#
## Locations of the drpc servers
# drpc.servers:
#     - "server1"
#     - "server2"

## Metrics Consumers
# topology.metrics.consumer.register:
#   - class: "backtype.storm.metric.LoggingMetricsConsumer"
#     parallelism.hint: 1
#   - class: "org.mycompany.MyMetricsConsumer"
#     parallelism.hint: 1
#     argument:
#       - endpoint: "metrics-collector.mycompany.org"
  

到此主节点就配置好了,我们验证一下是否能正常启动。

./storm nimbus &

./storm ui &

./storm logviewer &

./storm supervisor &

 

启动一切正常。

 

下面接着配置另外两个工作节点,首先把Storm安装包拷贝到其它两台机器上,操作如下:

scp -rp apache-storm-0.9.4 root@10.100.152.5:/opt/app/

scp -rp apache-storm-0.9.4 root@10.100.152.6:/opt/app/

 

然后在工作节点上启动supervisor,启动的命令如下:

./storm supervisor &

 

启动一切正常,但奇怪的问题来了,我们一共启动了3个supervisor,怎么在UI里面只看得到一个呢?

 

郁闷了一会儿,然后刷新了几下,发现supervisor显示的另外一台机器上的。

 

仔细一看,怎么两个机器的supervisor的ID怎么是相同的呢?后面上网查了一些相关资料,才知道supervisor的ID是通过storm.local.dir目录的一些文件生成的。到此,我们已经知道问题出在哪里了。我们是先部署的主节点,然后从主节点直接拷贝到工作节点的,所以这个目录下的文件是相同的,这样生成的supervisor的ID肯定就是一样的了。

 

要想解决这个问题,我们有两种方案。

解决方案一:

直接把默认storm.local.dir目录删除。

 

解决方案二:

通过修改storm.yaml的配置文件,重新指定目录。我们是采用的此方案,增加配置:storm.local.dir: /opt/data/storm

 

重启后问题恢复,皆大欢喜!最后的效果:

 

到此,Storm的集群部署已经完成。

 

注意:如果机器上有防火墙的话,记得配置防火墙端口。

 

 

 

 

分享到:
评论

相关推荐

    storm0.9.4

    4. 部署和运行Topologies:编写或准备你的Storm Topologies(数据处理逻辑),然后提交到Storm集群,由nimbus进行调度,分配到各个supervisors上执行。 5. 监控与维护:使用Storm提供的监控工具检查topologies的...

    storm源码包 apache-0.9.4

    部署时,通常将 Storm 集群安装在多台服务器上,通过 ZooKeeper 协调。 7. **开发工具**: Storm 提供了本地模式(Local Mode)进行开发和测试,可以在单机上模拟完整的集群环境。此外,`storm-starter` 项目包含...

    apache-storm-0.9.4.tar.gz

    Storm是Twitter开源的一个类似于Hadoop的实时数据处理框架,它原来是由BackType开发,后BackType被Twitter收购,将Storm作为Twitter的实时数据分析系统。实时数据处理的应用场景很广泛,例如商品推荐,广告投放,它...

    storm-kafka-0.9.4.jar

    storm-kafka-0.9.4.jar

    cerebro-0.9.4

    版本0.9.4是这个工具的一个更新迭代,它提供了丰富的功能和改进,使得ES集群的管理变得更加简便和直观。Cerebro的设计目标是让用户无需深入理解复杂的ES命令行操作,也能有效地进行集群监控和维护。 首先,Cerebro...

    redsn0w-win_0.9.4

    redsn0w-win_0.9.4 redsn0w-win_0.9.4 redsn0w-win_0.9.4

    cerebro-0.9.4.zip

    Cerebro是一款强大的开源工具,专门用于管理和监控Elasticsearch...总的来说,Cerebro是Elasticsearch集群管理和运维的强大工具,它的0.9.4版本为用户提供了丰富的功能和便利的操作体验,是ES管理员不可或缺的助手。

    Redis官方客户端0.9.4

    这款应用的最新版本是0.9.4,它是Redis官方推荐的客户端之一。 RedisDesktopManager的主要特性包括: 1. **跨平台支持**:此工具可在Windows、macOS和Linux等操作系统上运行,满足不同开发环境的需求。 2. **多...

    storm-zookeeper-jdk.zip

    这里要求Python 2.6.6+版本,意味着在部署和管理Storm集群时可能会用到Python。 4. **ZooKeeper**:ZooKeeper是Apache的一个开源项目,它提供一个分布式的、高可用的协调服务,用于解决分布式应用中的命名、配置...

    jcifs-ext-0.9.4.jar

    解压即可得到jcifs-ext-0.9.4.jar, java环境资源,jdk1.6及以上

    mac rdm 0.9.4

    **Redis Desktop Manager (RDM) for Mac 0.9.4 知识点详解** Redis Desktop Manager(简称RDM)是一款流行的图形用户界面(GUI)工具,专为管理和操作Redis数据库设计。它允许用户通过直观的界面进行键值对的查看、...

    xlrd-0.9.4

    xlrd-0.9.4,python2,win32,用于python2对于excel2003及其以下格式的操作

    Kemulator0.9.4

    【Kemulator 0.9.4:电脑模拟手机的利器】 Kemulator是一款功能强大的手机模拟器,专为在个人计算机上运行移动设备应用程序而设计。它的版本0.9.4代表了软件的一个特定更新,可能包含了性能优化、新功能的添加或...

    jsqlparser-0.9.4.jar

    一个SQL解析引擎JSQLParser,可以将SQL变成Java对象系列,可以通过一个sql,查询到one 2 one, one 2 many, 甚至 one 2 one 2 many 2 one的查询

    storm-core-0.9.4.jar

    Storm Core Java API 和 Clojure 实现。 org.apache.storm/storm-core/0.9.4/storm-core-0.9.4.jar

    redis-desktop-manager-0.9.4

    redis-desktop-manager 0.9.4 网上找来的给大家分享一下

    jcifs-ext-0.9.4.zip

    <groupId>org.samba.jcifs <artifactId>jcifs-ext <version>0.9.4 </dependency>

Global site tag (gtag.js) - Google Analytics