Storm 反压机制

woodding2008

浏览: 292542 次
性别:
来自: 北京

最近访客更多访客>>

lixinendo

ws715

mulingya

KevinSha

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Storm

反压机制

Storm的反压机制不成熟直接带来的后果是洪峰流量或者流量预估不准确导致任务的worker OOM，频繁漂移。Storm1.0版本已经使用新的反压机制，社区解决方案：https://issues.apache.org/jira/browse/STORM-886

https://github.com/apache/storm/pull/700

反压过程

worker executor的接收队列大于高水位，通知反压线程
worker反压线程通知zookeeper，executor繁忙事件
所有worker监听zookeeper executor繁忙的事件
worker spouts降低发送tuple速度

storm 1.0以前的反压

Spout tuples 不使用message id， TOPOLOGY_MAX_SPOUT_PENDING是不生效的。

public static final String TOPOLOGY_MAX_SPOUT_PENDING="topology.max.spout.pending"; 
public static final Object TOPOLOGY_MAX_SPOUT_PENDING_SCHEMA = ConfigValidation.IntegerValidator;

The maximum number of tuples that can be pending on a spout task at any given time. 
This config applies to individual tasks, not to spouts or topologies as a whole. 

A pending tuple is one that has been emitted from a spout but has not been acked or failed yet.
Note that this config parameter has no effect for unreliable spouts that don't tag their tuples with a message id.

spout执行nextTupe逻辑

(fn []
          ;; This design requires that spouts be non-blocking
          (disruptor/consume-batch receive-queue event-handler) ;;从recieve-queue取出batch tuples, 并使用tuple-action-fn处理
          
          ;; try to clear the overflow-buffer, 将overflow-buffer里面的数据放到发送的缓存queue里面
          (try-cause
            (while (not (.isEmpty overflow-buffer))
              (let [[out-task out-tuple] (.peek overflow-buffer)]
                (transfer-fn out-task out-tuple false nil)
                (.removeFirst overflow-buffer)))
          (catch InsufficientCapacityException e
            ))
          
          (let [active? @(:storm-active-atom executor-data)
                curr-count (.get emitted-count)]
            (if (and (.isEmpty overflow-buffer)  ;;只有当overflow-buffer为空, 并且pending没有达到上限的时候, spout可以继续emit tuple
                     (or (not max-spout-pending)
                         (< (.size pending) max-spout-pending)))
              (if active?  ;;storm集群是否active
                (do  ;;storm active
                  (when-not @last-active  ;;如果当前spout出于unactive状态
                    (reset! last-active true)
                    (log-message "Activating spout " component-id ":" (keys task-datas))
                    (fast-list-iter [^ISpout spout spouts] (.activate spout))) ;;先active spout
               
                  (fast-list-iter [^ISpout spout spouts] (.nextTuple spout))) ;;调用nextTuple,产生新的tuple
                (do ;;storm unactive
                  (when @last-active ;;如果spout出于active状态
                    (reset! last-active false)
                    (log-message "Deactivating spout " component-id ":" (keys task-datas))
                    (fast-list-iter [^ISpout spout spouts] (.deactivate spout))) ;;deactive spout并休眠
                  ;; TODO: log that it's getting throttled
                  (Time/sleep 100))))
            (if (and (= curr-count (.get emitted-count)) active?) ;;没有能够emit新的tuple(前后emitted-count没有变化)
              (do (.increment empty-emit-streak)
                  (.emptyEmit spout-wait-strategy (.get empty-emit-streak))) ;;调用spout-wait-strategy进行sleep
              (.set empty-emit-streak 0)
              ))           
          0)) ;;返回0, 表示async-loop的sleep时间为0
      :kill-fn (:report-error-and-die executor-data)
      :factory? true
      :thread-name component-id)]))

tuple pending的个数是有限制

p*num-tasks 
p是TOPOLOGY-MAX-SPOUT-PENDING, num-tasks是spout的task数

max-spout-pending (executor-max-spout-pending storm-conf (count task-datas))
(defn executor-max-spout-pending [storm-conf num-tasks]
  (let [p (storm-conf TOPOLOGY-MAX-SPOUT-PENDING)]
    (if p (* p num-tasks))))