netty5笔记-线程模型3-EventLoop -

youaremoon

浏览: 32801 次
性别:
来自: 重庆

最近访客更多访客>>

w8610128

HarborChung

FirstBlood

airu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

netty5笔记-线程模型3-EventLoop

博客分类：

netty

netty5 源码线程模型 EventLoop

NioEventLoop相对NioEventLoopGroup来说就复杂很多了，需要一定的耐心来看这篇文章。

首先从NioEventLoop的启动讲起，对于线程池来说，启动一般都是从第一个任务的添加开始的。经过跟踪，找到execute()方法在SingleThreadEventExecutor类中：

    public void execute(Runnable task) {
        if (task == null) {
            throw new NullPointerException("task");
        }

        // inEventLoop表示启动线程与当前线程相同，相同表示已经启动，不同则有两种可能：未启动或者线程不同
        boolean inEventLoop = inEventLoop();
        if (inEventLoop) {
            // 运行中则直接添加任务到队列中
           addTask(task);
        } else {
            // 尝试启动任务
           startExecution();
            // 将任务加到任务队列taskQueue中
           addTask(task);
            // 发现已经关闭则移除任务并拒绝
           if (isShutdown() && removeTask(task)) {
                reject();
            }
        }

        if (!addTaskWakesUp && wakesUpForTask(task)) {
            // 唤醒执行线程
           wakeup(inEventLoop);
        }
    }
    private void startExecution() {
        // 未启动的状态下才进行启动
 if (STATE_UPDATER.get(this) == ST_NOT_STARTED) {
 if (STATE_UPDATER.compareAndSet(this, ST_NOT_STARTED, ST_STARTED)) {
                // 增加一个定时任务，该任务将定时任务队列中的已取消任务从队列中移除，该任务每间隔1秒执行1次
 schedule(new ScheduledFutureTask<Void>(
 this, Executors.<Void>callable(new PurgeTask(), null),
 ScheduledFutureTask.deadlineNanos(SCHEDULE_PURGE_INTERVAL), -SCHEDULE_PURGE_INTERVAL));
                // 开始执行
 scheduleExecution();
 }
 }
 }
    
    // 如果已经关闭了，则不能再加任务，否则加入到任务队列中
    protected void addTask(Runnable task) {
 if (task == null) {
 throw new NullPointerException("task");
 }
 if (isShutdown()) {
 reject();
 }
 taskQueue.add(task);
 }

简单的部分就不用讲了，我们来看看两个可能会让人疑惑的点：

1、scheduleExecution()

这个方法是将asRunnable提交到executor，由于线程池的线程数与EventExecutor的个数相同，所以可以保证每次asRunnable都能及时处理， asRunnable逻辑比较简单，执行所在类中的run方法，这个run方法是个抽象方法，它的实现有几个要求要满足：

a、run方法中只执行一定量的任务。如果执行太多，或者一直执行不跳出，那么后期netty中期望引入的fork/jion框架stealing机制就会失效或者大打折扣；

b、run方法执行完一定量任务后，本次任务完成，此时需要调用scheduleExecution()，否则该EventExecutor后面的任务将无法进行；

c、基于b中子类必须调用scheduleExecution()的要求，任务的执行必须使用try catch方式。如果不这样的话，发生任何异常都会导致EventExecutor关闭，里面的所有任务都将被清理。

另一个需要注意的点是scheduleExecution方法在执行asRunnable前将thread置为null了，该thread表示EventLoop所在线程，由于executor.execute的执行并不能保证是哪个Thread来执行，因此先把thread置为null，等进行asRunnable的run方法后再次设置thread为Thread.currentThread。

    protected final void scheduleExecution() {
        updateThread(null);
        executor.execute(asRunnable);
    }

    private final Runnable asRunnable = new Runnable() {
 @Override
 public void run() {
 updateThread(Thread.currentThread());
 // lastExecutionTime must be set on the first run
 // in order for shutdown to work correctly for the
 // rare case that the eventloop did not execute
 // a single task during its lifetime.
 if (firstRun) {
 firstRun = false;
 updateLastExecutionTime();
 }
 try {
 SingleThreadEventExecutor.this.run();
 } catch (Throwable t) {
                // 发生异常则关闭整个EventExecutor
 logger.warn("Unexpected exception from an event executor: ", t);
 cleanupAndTerminate(false);
 }
 }

下面我们写一段简单的代码来看看，一个简单的异常是如何使整个EventExecutor挂掉的:

    public static void main(String[] args) throws Exception {
    	DefaultEventExecutorGroup group = new DefaultEventExecutorGroup(1);
    	group.execute(new Runnable() {
			@Override
			public void run() {
				System.out.println("a point");
				throw new RuntimeException("runtime error");
			}
    	});
    	Thread.sleep(100);
    	group.execute(new Runnable() {
			@Override
			public void run() {
				System.out.println("b point");
				throw new RuntimeException("runtime error");
			}
    	});
    	
    	Thread.sleep(10000);
    	group.close();
    }

    // DefaultEventExecutorGroup 的问题在于其run方法并没有捕获异常，不知道现在修复没有。注意，4.x由于线程模型不同，类似的代码并不会有太大什么问题。
    protected void run() {
 Runnable task = takeTask();
 if (task != null) {
            // 正确的做法是捕获task.run的异常
 task.run();
 updateLastExecutionTime();
 }

 if (confirmShutdown()) {
 cleanupAndTerminate(true);
 } else {
 scheduleExecution();
 }
 }

2、wakeup

先回过头看看第一段代码中调用wakeup的地方，其中addTaskWakesUp的表示添加任务时是否会唤醒线程。啥意思呢，比如一个线程里执行了一个阻塞方法 BlockingQueue.take()，该方法在没有获取到数据的时候一直阻塞，要想恢复，往该BlockingQueue中加一个对象就可以了，线程恢复了执行，就可以进行其他判断了（如线程是否被关闭之类的判断）。DefaultEventExecutor传入的addTaskWakesUp=true, 因为它能阻塞的地方就在BlockingQueue.take()，因此加入一个任务可以唤醒线程，这里就为true。而NioEventLoop阻塞的地方在selector.select，加入task也无法立即唤醒线程，因此addTaskWakesUp=false。

        if (!addTaskWakesUp && wakesUpForTask(task)) {
            wakeup(inEventLoop);
        }

我们来看看针对DefaultEventExecutor和NioEventLoop，wakeup方法有何不同：

    // NioEventLoop通过selector.wakeup()来使阻塞在selector.select上的方法恢复
    protected void wakeup(boolean inEventLoop) {
        if (!inEventLoop && wakenUp.compareAndSet(false, true)) {
            selector.wakeup();
        }
    }
    // DefaultEventExecutor则通过往queue里面加一个空的Runnable来是阻塞的take方法恢复，这个也是默认的实现。
    protected void wakeup(boolean inEventLoop) {
 if (!inEventLoop || STATE_UPDATER.get(this) == ST_SHUTTING_DOWN) {
 taskQueue.add(WAKEUP_TASK);
 }
 }
    
   private static final Runnable WAKEUP_TASK = new Runnable() {
 @Override
 public void run() {
 // Do nothing.
 }
 };

通常在整个EventExecutor关闭或者添加一个任务时都需要调用唤醒方法。如果你自己实现的子类里还有其他方法会阻塞，你就需要想办法来恢复线程。

ok，下面开始讲run方法，也是大家最容易感兴趣的方法。SingleThreadEventExecutor的run方法为抽象方法，具体的实现在子类中，那么我们回到NioEventLoop：

    protected void run() {
        // 每次进来都将wokenUp设为false，这样如果有新任务提交，会触发一次selector.wakeup，这样即使进入下面的select(oldWakenUp)分支
        // 也能保证新提交任务能及时执行
 boolean oldWakenUp = wakenUp.getAndSet(false);
 try {
            if (hasTasks()) {
                // 当有任务时为了保证任务及时执行采用不阻塞的selectNow获取准备好I/O的连接
 selectNow();
 } else {
                // 当无任务时采用阻塞等待的方式获取连接
 select(oldWakenUp);
 if (wakenUp.get()) {
 selector.wakeup();
 }
 }

 cancelledKeys = 0;
 needsToSelectAgain = false;
 final int ioRatio = this.ioRatio;
 if (ioRatio == 100) {
 processSelectedKeys();
 runAllTasks();
 } else {
 final long ioStartTime = System.nanoTime();

 processSelectedKeys();

 final long ioTime = System.nanoTime() - ioStartTime;
                // 根据处理selectKeys的时间 和 ioRatio计算得到处理普通task的时间
 runAllTasks(ioTime * (100 - ioRatio) / ioRatio);
 }

            // 如果被关闭了，则关闭所有连接(closeAll)，并且完成对应的清理任务
 if (isShuttingDown()) {
 closeAll();
 if (confirmShutdown()) {
 cleanupAndTerminate(true);
 return;
 }
 }
 } catch (Throwable t) {
 ...
 }

 scheduleExecution();
 }

select(oldWakeUp)与selectNow都是获取已经准备好的连接，不同的是select(oldWakeUp)会产生阻塞，其处理如下：

1、获取定时任务中最近执行的任务，并根据这个时间确定select(timeout)的timeout值，如下个定时任务1秒后执行，则select(1000), 即等待1秒后不管有没有准备好的连接都会返回。由于EventLoop启动时加入了一个每秒执行一次的任务，这里select最多不会超过1秒，需要注意的是由于加入定时任务是并不会调用selector.wakeup()，因此执行线程进入select(timeout)后，如果其他线程加入了定时任务且时间小于timeout，就无法及时执行，不过误差小于1秒问题不大。顺便提醒下nio的异步阻塞的“阻塞”就是指select(timeout)这里；

2、如果发现有定时任务已经可以执行了，则直接selectNow()后返回；

3、java早期的nio bug会导致cpu 100%, 此时select(timeout)不会阻塞直接返回0，在netty中判断方式为在很短时间内（小于1秒）完成了多次（默认512）select(timeout)，则发生了该bug，此时进行rebuildSelector来消除bug。精简后代码如下：

          ......
          for (;;) {
               long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
                int selectedKeys = selector.select(timeoutMillis);
                selectCnt ++;
                long time = System.nanoTime();
                if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
 // 正常阻塞则将selectCnt置为1，否则selectCnt会一直累加知道进入下面一个分支
 selectCnt = 1;
 } else if (selectCnt >= 512) {
                    rebuildSelector();
                    ......
                }......
            }

接着来看看processSelectedKeys。从方法名可以看出这里主要是处理前面select获取到的已经准备ok的连接。根据优化情况选择processSelectedKeysPlain和processSelectedKeysOptimized方法，两个方法代码类似。

private void processSelectedKeysOptimized(SelectionKey[] selectedKeys) {
        for (int i = 0;; i ++) {
            // 在获取所有key(即flip）时会将未最后一个有效key的下一个位置值为null，因此碰到null，说明所有有效的key已经获取完
           final SelectionKey k = selectedKeys[i];
            if (k == null) {
                break;
            }
            // null out entry in the array to allow to have it GC'ed once the Channel close
            // See https://github.com/netty/netty/issues/2363
            selectedKeys[i] = null;

            final Object a = k.attachment();

            // key关联两种不同类型的对象，一种是AbstractNioChannel，一种是NioTask
            if (a instanceof AbstractNioChannel) {
                processSelectedKey(k, (AbstractNioChannel) a);
            } else {
                NioTask<SelectableChannel> task = (NioTask<SelectableChannel>) a;
                processSelectedKey(k, task);
            }

            // 如果需要重新select则重置当前数据
            if (needsToSelectAgain) {
                // null out entries in the array to allow to have it GC'ed once the Channel close
                // See https://github.com/netty/netty/issues/2363
                for (;;) {
                    if (selectedKeys[i] == null) {
                        break;
                    }
                    selectedKeys[i] = null;
                    i++;
                }

                selectAgain();
                // Need to flip the optimized selectedKeys to get the right reference to the array
                // and reset the index to -1 which will then set to 0 on the for loop
                // to start over again.
                //
                // See https://github.com/netty/netty/issues/1523
                selectedKeys = this.selectedKeys.flip();
                i = -1;
            }
        }
    }

上面的处理过程中有一个needsToSelectAgain，什么情况下会触发这个条件呢。当多个channel从selector中撤销注册时，由于很多数据无效了（默认为256），需要重新处理：

    void cancel(SelectionKey key) {
        key.cancel();
        cancelledKeys ++;
        if (cancelledKeys >= CLEANUP_INTERVAL) {
            cancelledKeys = 0;
            needsToSelectAgain = true;
        }
    }

一个selectedKey的attachment可能对应AbstractNioChannel和NioTask两种对象。第一种很好理解，就是我们常用的netty nio连接。另一个NioTask则是作者给我们留的一个接口，他可以允许开发者自己去实现一个非netty AbstractNioChannel的SelectableChannel，对于这种对象，准备好数据后会调用对象的NioTask.channelReady方法，由开发者自己实现对应的方法。如果你想要一个NioTask的例子，很遗憾的告诉你我没有，也不想写，连netty的开发者都说“你要是实现了请告诉我”，当然，他说的是英文！再看看Channel的处理：

    private static void processSelectedKey(SelectionKey k, AbstractNioChannel ch) {
        final AbstractNioChannel.NioUnsafe unsafe = ch.unsafe();
        if (!k.isValid()) {
            // close the channel if the key is not valid anymore
            unsafe.close(unsafe.voidPromise());
            return;
        }

        try {
            int readyOps = k.readyOps();
            // 如果准备好READ或ACCEPT则触发channel.unsafe().read()
            if ((readyOps & (SelectionKey.OP_READ | SelectionKey.OP_ACCEPT)) != 0 || readyOps == 0) {
                // 这里的操作与channel相关，不是本文重点，但以后会介绍
               unsafe.read();
                if (!ch.isOpen()) {
                    // 如果已经关闭，则不需要再处理该channel的其他事件，直接返回
                    return;
                }
            }
            if ((readyOps & SelectionKey.OP_WRITE) != 0) {
                // 如果准备好了WRITE则将缓冲区中的数据发送出去，如果缓冲区中数据都发送完成，则清除之前关注的OP_WRITE标记
                ch.unsafe().forceFlush();
            }
            if ((readyOps & SelectionKey.OP_CONNECT) != 0) {
                // 需要移除OP_CONNECT否则Selector.select(timeout)可能会出现cpu 100%
                // See https://github.com/netty/netty/issues/924
                int ops = k.interestOps();
                ops &= ~SelectionKey.OP_CONNECT;
                k.interestOps(ops);

                unsafe.finishConnect();
            }
        } catch (CancelledKeyException ignored) {
            unsafe.close(unsafe.voidPromise());
        }
    }

runAllTasks主要分成两步：

1、从定时任务队列拉取到执行时间的任务到任务队列；

2、循环从任务队列里取数据，知道队列为空。正常情况下ioRatio是非100的，所以for这部分的执行是有时间限制的，具体代码见runAllTasks(long timeoutNanos)，这里就不再贴出了。

还记得之前介绍的wakeup方法吗，NioEventLoop只对selector进行了wakeup，而没有对队列进行wakeup，因为下面的pollTask是采用的非阻塞方式。

    protected boolean runAllTasks() {
        fetchFromScheduledTaskQueue();
        Runnable task = pollTask();
        if (task == null) {
            return false;
        }

        for (;;) {
            try {
                task.run();
            } catch (Throwable t) {
                logger.warn("A task raised an exception.", t);
            }
            
            task = pollTask();
            if (task == null) {
                lastExecutionTime = ScheduledFutureTask.nanoTime();
                return true;
            }
        }
    }

到这里run方法介绍得差不多了，剩下一段对关闭状态的处理。关闭的处理我们可以先看shutdownGracefully，netty号称实现优雅关闭，那么它是如何优雅的？

1、如果ST_NOT_STARTED或者ST_STARTED尝试将状态切换为ST_SHUTTING_DOWN，如果被别的线程抢先执行了，则此线程直接返回Future等待结果即可；

2、切换状态成功的线程可以进行后面的逻辑：如果线程未启动则发起一次scheduleExecution，让EventLoop进行后面的清理逻辑。

3、如果线程在执行中则进行wakeup唤起阻塞的线程。

3、EventLoop执行run方法的倒数第二部分：判断状态被置为关闭，进行最后的清理工作。

public Future<?> shutdownGracefully(long quietPeriod, long timeout, TimeUnit unit) {
        
        boolean inEventLoop = inEventLoop();
        boolean wakeup;
        int oldState;
        // 尝试切换EventLoop状态，如果竞争失败则返回Future等待结果
       for (;;) {
            if (isShuttingDown()) {
                return terminationFuture();
            }
            int newState;
            wakeup = true;
            oldState = STATE_UPDATER.get(this);
            if (inEventLoop) {
                newState = ST_SHUTTING_DOWN;
            } else {
                switch (oldState) {
                    case ST_NOT_STARTED:
                    case ST_STARTED:
                        newState = ST_SHUTTING_DOWN;
                        break;
                    default:
                        newState = oldState;
                        wakeup = false;
                }
            }
            // 由于newState和oldState可能相同，这里可能执行多次，但是没有关系，在关闭状态下即使这里成功了，也不满足执行scheduleExecution和wakeup的条件
           if (STATE_UPDATER.compareAndSet(this, oldState, newState)) {
                break;
            }
        }
        gracefulShutdownQuietPeriod = unit.toNanos(quietPeriod);
        gracefulShutdownTimeout = unit.toNanos(timeout);
        
        if (oldState == ST_NOT_STARTED) {
            scheduleExecution();
        }
        // 如果之前的状态是运行中，需要进行一次唤醒，防止一直阻塞或者阻塞时间很长
        if (wakeup) {
            wakeup(inEventLoop);
        }

        return terminationFuture();
    }

清理包括以下几步：

1、关闭当前EventLoop下的selector维护的所有连接，包括AbstractNioChannel和NioTask, 对应closeAll();

2、取消所有定时任务并清空定时任务队列，所有未执行的非定时任务执行完毕, 所有shutdownHook执行完毕，对应confirmShutdown();

3、如果confirmShutdown()失败（返回false）则进入下一轮run继续尝试，否则进行cleanupAndTerminate方法，循环调用confirmShutdown()直到所有任务执行完，将状态设为ST_TERMINATED，将selector关闭。 confirmShutdown返回失败的场景：confirmShutdown方法中成功执行了一个任务则返回失败，而由于shutdownGracefully只是将状态设为ST_SHUTTING_DOWN，还可以往队列中添加任务，因此这里有失败的可能。

最后我们做个简单的总结：

1、通过execute方法触发运行，运行方式为使用executor.execute执行asRunnable， asRunnable执行EventExecutor类中的run方法；

2、子类通过实现run方法来定制自己的功能。在NioEventLoop中，执行一批操作的过程如下：

2.1 从selector中取出准备好的连接，处理这批连接的读、写事件或者NioTask中的channelReady方法

2.2 处理非I/O事件

2.3 判断状态是否为ST_SHUTTING_DOWN，如果是则进行资源清理操作，包括关闭连接、取消定时任务、处理剩余的非定时任务、处理shutdownHook, 关闭selector

2.4 如果状态不为ST_SHUTTING_DOWN，再次调用executor.execute方法执行asRunnable，如此循环。同一个EventExecutor中执行完一批操作才会触发下一批操作，因此依然是线程安全的；

3、通过shutdownGracefully关闭，主要是设置关闭状态，并触发run方法的执行（执行到2.3），通过这种方式让对应的EventExecutor生命周期自然终止。

分享到：

netty5笔记-concurrent-FastThreadLocal | netty5笔记-线程模型2-EventLoopGroup

2015-12-15 10:17
浏览 930
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

netty5笔记-线程模型3-EventLoop

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

netty5笔记-线程模型3-EventLoop

评论

发表评论

相关推荐

netty5学习笔记-内存泄露检测

netty5学习笔记-内存池1-PoolChunk

netty5学习笔记-内存池2-PoolSubpage

netty5学习笔记-内存池3-PoolChunkList

netty5学习笔记-内存池4-PoolArena

netty5学习笔记-内存池5-PoolThreadCache

netty5学习笔记-内存池6-可调优参数

netty5笔记-线程模型1-Promise

netty5笔记-线程模型2-EventLoopGroup

netty5笔记-concurrent-FastThreadLocal

netty5笔记-线程模型4-MpscLinkedQueue

netty5笔记-总体流程分析1-ServerBootstrap启动

netty5笔记-总体流程分析2-ChannelPipeline

netty5笔记-总体流程分析3-ChannelHandlerContext

netty5笔记-总体流程分析4-NioServerSocketChannel

netty5笔记-总体流程分析4-NioSocketChannel之服务端视角

netty5笔记-总体流程分析5-客户端连接过程

最近访客更多访客>>