A Look Inside JBoss Cache

feixingfei

浏览: 45592 次
性别:
来自: 上海

最近访客更多访客>>

Amin_liu

keepgoingxjw

java_Chris

bookong

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (65)

社区版块

存档分类

So I was asked a number of times as to why I didn’t put in any shameless plugs for JBoss Cache - the project I lead - when I wrote my last article at DZone on distributed caching and parallelism, and here is why.

This follow-up article focuses entirely on the brand-spanking-new JBoss Cache 3.0.0 - code-named Naga, and goes in depth to discuss how our high-performance concurrency model works.

Life before Naga

Before I start talking about Naga specifically, I’d like to delve into a brief history of JBoss Cache. Around 2002, the JBoss Application Server (JBoss AS) needed a clustered cache solution (see my previous article for definition of terms) for its HTTP and EJB session state replication, to maintain high availability in a cluster. JGroups - an open source group communication suite - shipped with a demo replicated hash map. Bela Ban, the founder and current maintainer of JGroups, expanded it to accommodate a tree structure - where data is organized into nodes into a tree data structure - and other cache-relevant features such as eviction and JTA transactions. Around early 2003, this was moved into JBoss AS’s CVS repository and became a part of the JBoss AS code base.

Around March 2005, JBoss Cache was extracted from the JBoss AS’s repository and became it’s own standalone project. The rest, as they say, became history. Features such as cache loading and various cache loader implementations, eviction policies, and buddy replication were gradually added. TCP-based delegating cache servers allowed you to build tiers of caches. Custom marshalling frameworks provided a highly performant alternative to Java serialization when replicating state. Along the way, the cache had through one more major release - 2.0 - which involved a major API change and baselining on Java 5. Two other editions - a POJO edition and Searchable edition - have evolved as well, building on the core cache to provide specific features.

Handing over the torch

Now, it is time for the 2.x series to hand over the torch to 3.0.0. Naga, as it is called internally, is well deserving of a major version change. In addition to evolutionary changes and improvements in the code base - better resource management, marshalling, overall performance enhancements, and a brand new and much simplified configuration file format - it also contains at least one revolutionary change:

MVCC has landed

Multi-versioned concurrency control - MVCC - was adopted as the default concurrency scheme in the cache.

When run in local mode, the most costly part of the cache in terms of memory and CPU cycles is the locking involved in maintaining integrity of shared data. In a clustered mode, this locking is the second most expensive thing, after RPC calls made by the cache instance to remote instances.

Legacy locking schemes

In JBoss Cache 1.x and 2.x, we offered two different locking schemes - an optimistic one and a pessimistic one. Each had their pros and cons, but in the end they were both still costly in terms of performance.

The pessimistic scheme used a lock per tree node. Reader threads would obtain non-exclusive read locks and writer threads would obtain exclusive write locks on these nodes before performing any operations. The locks we used were a custom extension of JDK ReentrantReadWriteLocks, modified to support lock upgrading where within the scope of a transaction, a thread may start off reading a node and then later attempt to write to it.

Overall, this scheme was simple and robust, but didn’t perform too well due to the memory overhead of maintaining a lock per node. More importantly, there was the reduced concurrency since the existence of read locks prevented write locks from being obtained. The effect of readers blocking writers also introduced the possibility of deadlocks. Take, for example, transaction A which performs a read on node /X and a write on node /Y before committing. Transaction B, which performs a read on node /Y and a write on node /X before committing, starts at the same time. With some unfortunate timing, we could end up with a situation where transaction A has a read lock on /X and and is waiting for the write lock on /Y. Transaction B has a read lock on /Y and is waiting on a write lock on /X. Both transactions would deadlock, until one of them times out and rolls back. And typically, once one transaction has timed out, chances are so would the other since they would both have been waiting for almost the same amount of time.

To overcome the deadlock potential, we offered an optimistic locking scheme. Optimistic locking used data versioning on each node. It copied any nodes accessed into a transaction workspace, and allowed transactions to work off the copy. Nodes copied for reading provided repeatable read semantics while nodes copied for writing allowed writer threads to proceed regardless of simultaneous readers. Modified nodes were then merged back to the main tree at transaction commit time, subject to version checking to ensure no concurrent writes took place.

Optimistic locking offered a much higher degree of concurrency with concurrent readers and writers, and removed the risk of deadlock. But it had two main drawbacks. One is performance, since the constant copying of state for each concurrent thread increased memory footprint significantly, and was also costly in terms of CPU cycles. The other is that concurrent writers could exist, but one would inevitably fail at commit time when a data version check fails. So this meant that writer transactions could happily go ahead and do a lot of costly processing and writing, but only fail at the very end when attempting to commit.

So how does MVCC help?

MVCC offers non-blocking readers where readers do not block writer threads, providing a high degree of concurrency as well as removing the risk of deadlock. It also is fail-fast, in that writers work sequentially and don’t overlap, and if they do time out in acquiring a write lock, it happens very early on in the transaction, when a write occurs rather than when a transaction commits. Finally, MVCC is also memory efficient in that it only maintains 1 copy of state for all readers, and 1 version being modified for the single, sequential writer. Even better, our implementation of MVCC uses no locks at all for reader threads (very significant for a read-heavy system like a cache), and a custom exclusive lock implementation for writers. This custom lock is completely free of synchronized blocks and uses modern techniques like compare-and-swap and memory fencing using volatile variables to achieve synchronization. All this leads to a very highly performant and scalable locking scheme.

Let’s look at some details here.

The extremely high performance of JBoss Cache's MVCC implementation for reading threads is achieved by not requiring any synchronization or locking for readers. For each reader thread, the cache wraps state in a lightweight container object, which is placed in a container local to the thread (a ThreadLocal) or an ongoing transaction. All subsequent operations made on the cache with regards to this state happens via this container object. This use of Java references allows for repeatable read semantics even if the actual state changes concurrently.

Writer threads, on the other hand, need to acquire a lock before any writing can commence. We now use lock striping to improve the memory performance of the cache, and the size of the shared lock pool can be tuned using the concurrencyLevel attribute of the locking element. (See the JBoss Cache configuration reference for details ). After acquiring an exclusive lock on a node, the writer thread then wraps the state to be modified in a container as well, just like with reader threads, and then copies this state for writing. When copying, a reference to the original version is still maintained in the container for rollbacks. Changes are then made to the copy and the copy is finally written to the data structure when the write completes.

This way, subsequent readers see the new version while existing readers still hold a reference to the original version in their context, achieving repeatable read semantics.

If a writer is unable to acquire the write lock after some time, a TimeoutException is thrown.

Although MVCC forces writers to obtain a write lock, a phenomenon known as write skews may occur when using repeatable read as your isolation level. This happens when concurrent transactions perform a read and then a write, based on the value that was read. Since reads involve holding on to the reference to the state in the transaction context, a subsequent write would work off that original state read, which may now be stale.

The default behavior with dealing with a write skew is to throw a DataVersioningException, when it is detected when copying state for writing. However, in most applications, a write skew may not be an issue (for example, if the state written has no relationship to the state originally read) and should be allowed. If your application does not care about write skews, you can allow them to happen by setting the writeSkewCheck configuration attribute to false. See the JBoss Cache configuration reference for details.

Note that write skews cannot happen when using READ_COMMITTED since threads always work off committed state. Write skews are also witnessed in optimistic locking, manifested as a DataVersioningException, except that this can happen with any isolation level when using optimistic locking.

Is there a tutorial on this stuff?

Of course. All you need to do is download the jbosscache-core-all.zip distribution of JBoss Cache. A tutorial is bundled in the distribution, complete with a GUI to visualize what’s going on in your cache as you do stuff. Alternately, there is also a GUI demo to demonstrate cache capabilities.

Nice - so where do I get it?

So to sum things up, Naga is the latest and greatest of what JBoss Cache has to offer, with significantly faster access for both readers and writers, much better stability and predictability in performance, faster replication, lower memory footprint, makes you coffee, and walks your dog. Download Naga here [6]. A users’ guide, FAQ and tutorial is also available here.

[http://java.dzone.com/articles/a-look-inside-jboss-cache]

作者 Manik Surtani 是开源项目 JBoss Cache 的领导人，本文主要是对 JBoss Cache3.0 —— Naga 一些全新技术的论述，其中也不乏对原有技术的回顾。总的来说，本文还是揭示了缓存的未来—— MVCC ，值得推荐。

正文

当我在 DZone 写完《分布式，缓存与并行》一文后，许人多次我为什么这么“厚着脸皮”的力挺我领导的 JBoss Cache 开源项目时，我想本文会给你们一个满意的答案。这篇文章将完全致力于焕然一新的 JBoss Cache3.0 （代号 Naga ），并且深入讨论我们的高性能并发模型。

Naga 的今生前世

在开始讨论 Naga 之前，我还是想简要的说说 JBoss Cache 的历史。大概在 2002 的时候，为了维持集群的高可用性， JBoss AS （ JBoss 应用服务器）需要提供一个专门为解决 HTTP 和 EJB Session 状态复制的集群缓存方案。 JGroups 是一款开源的成组通信 (group communication ) 项目。 Bela Ban 是 JGroup 的创始人也是维护开发人员，对 JGroup 进行扩展，使其适应树形数据结构，并且还增加了一个缓存相关的特性： eviction 和 JTA Transactions 。大约在 2003 年年初时，这个被扩展的树型结构迁移到了 JBoss AS CVS 的 repository 中，从此成为 Jboss AS 中的一员。

时间齿轮又指向了 2005 年 3 月， JBoss Cache 从 JBoss AS 的 repository 中分离出来，单独形成一个项目。唉，那都是陈年旧事了。不过，像 cache loading( 缓存负载 ) ，多种缓存装载器的实现， eviction 策略， buddy replication(buddy 复制 ) 都是后来慢慢加入的。基于 TCP 的委托缓存服务器允许你构建多层缓存。当进行状态提制时， Custom marshalling 框架为比 Java 序列化机制提供了更高的性能。紧接着又迎来了 JBoss Cache2.0 的发布，这次的 API 改动很大，并且要求基于 Java5 。另外两个基于此核心缓存技术的 POJO版本和 Searchable版本也发展良好。

火炬交接

现在，是将 Jboss Cache2.* 系列的火炬转交给 Naga 了。这次 Naga 又有了很大的变化和改进，除了资源管理和 marshalling 的全面提升，以及全新的简化配置文件格式外，它还包括至少一个革命性的改变： MVCC 。

MVCC 时代已经到来

MVCC 全称 Multi-versioned concurrency control( 多版本并发控制 ) ，在 Naga 中已经被采纳作为默认的并发解决方案。

当以本地方式运行时，对内存和 CPU 而言，缓存最大的开销就是使用锁来在保证共享数据完整性。而到了集群环境中，锁成了继 RPC 调用后的第二大开销“大户”。

对遗留的锁机制回顾

在 JBoss Cache1.* 和 2.* 时代，我们提供两种不同的锁方案——即乐观锁和悲观锁。它们各有千秋，但是从性能角度上来说，它们还是开销太大。

悲观锁用来锁住树中的每个结点。 Reader threads （读线程）可以得到一个非独占的 read locks( 读锁 ) ，而 writer threads( 写线程 ) 却可得到一个独占的 write locks( 写锁 ) ，从而独占这些结点。我们实现的锁是通过扩展 JDK 的 ReentrantReadWriteLocks ，将其改进成为支持事务作用域内的锁更新——即一个线程可以开始时用 read locks 去读取一个结点，稍后再尝试用 write locks 着去更新它。（注意，悲观锁的读写是互斥的，无法同时进行的）

总得来说，这种方案简单而且健壮，但由于内存要维护每个被锁的结点，所以从性能上说还不是很满意。更重要的是，如果结点已经被 read locks 锁住了，那么 write locks 就没办法再去操作它们了，使得并发性能下降。读操作阻塞写操作的后果还容易造成死锁。好吧，我们现在来看个例子：

“事务 A 提交前，对结点 /X 执行读操作，对结点 /Y 执行写操作。事务 B 恰恰与之相反，在提交前，对结点 /Y 执行读操作，对结点 /X 执行写操作。不幸的事发生了，事务 A 对结点 /X 用了 read lock ，并且还在等待时机去用 write lock 操作 /Y 结点；而事务 B 对结点 /Y 用了 read lock ，也还在等待时机去用 write lock 操作 /X 结点。这两个事务发生死锁了，直到其中的一方超时，然后事务回滚。”

为了克服潜在的死锁问题，我们提供了乐观锁。乐观锁对每个结点采用版本控制方式。它允许任意多个结点拷贝 (Nodes copied) 出现在一个事务中，并允许事务处理这些拷贝。结点拷贝为读操作提供了可重复读取的语义，同时还允许 writer threads 在不考虑读操作的情况下，进行相应的写操作。那些修改的结点会在事务提交时进行版本检查，确保没有新的并发写操作发生，最后将结点合并到缓存的 main tree 上去。

乐观锁提供了更高级别的并发机制来处理并发读写操作，而且还避免了死锁的风险。但它仍然有两个主要的缺点：一是性能问题。因为不断的将结点的状态拷贝到每个并发线程所造成的内存和 CPU 开销是不容忽略的。二是尽管并发时允许了写操作，但是一旦发现数据的版本不对，事务提交时不可避免的还是会失败。也就是说，此时写事务虽然可以不受限制的进行大量处理和写操作，但是这样在事务结束的时候容易出现提交失败。

MVCC 有什么用

MVCC 提供了非阻塞 (non-blocking) 读操作 ( 它并不会去阻塞 wirter threads) ，在避免死锁的同时也提供了更高级的并发机制。它采用了 fail-fast 机制，如果写操作得到了一个 write lock ，那么它们也是依次进行，不允许重叠。最后我要说的是， MVCC 在内存使用率上也是可圈可点：它对所有的读操作只维护一个状态的拷贝；对依次顺序进行的写操作来说，每次的修改只会对版本号产生一次变化。更棒的是，我们的 MVCC 实现甚至可以对 reader threads 完全不采用任何锁 ( 对于像缓存这样频繁读取的系统来说，意义太大了 ) ，并且还允许自定义的为写操作实现独占锁。自定义锁完全摒弃了同步代码块，使用了最新的并发技术： compare-and-swap 和 memory fencing( 使用 volatile variables 实现同步 ) 。所有的这一切都会让 MVCC 在性能和可伸缩性方面，成为一个更加出色的解决方案。

说了这么多，是该谈 MVCC 的细节了。

JBoss Cache 的 MVCC 实现这所以这么高效在于 reading threads 之间不需要任何同步代码块或锁机制。对于每个 reader thread 来说，缓存将结点的状态包装在一个轻量级的容器对象（比如说 ThreadLocal ）或者长事务中。所有的后继操作要想访问或操作缓存中的结点，都必须通过这个容器对象。甚至当结点的状态真的在并发时发生了变化，那么使用 Java 引用的使用也可以达到可重复读取的语义。（下文有具体的说明）

从另一方面来看， write threads 首先需要获得一个锁后，才可执行写操作。现在，我们的做法是使用 lock striping （分离锁）来提升缓存的内存性能，而 shared lock pool( 共享锁池 ) 级别可以使用被锁定结点的 concurrencyLevel 属性来进行调整 ( 更多细节，请看 Jboss Cache的配置参考 ) 。在获得一个独占锁后，如同 reader threads 那样， writer thread 也会将要修改结点的状态包装在一个容器中，然后将它的状态拷贝出来，再进行写操作。注意，在拷贝状态的时候，指向原始结点的引用仍然是可以进行回滚操作的。当写操作即将完成时， writer thread 最终又将已经发生改变的拷贝的状态写回相应的数据结构中（比如说文件系统，数据库等，但是始终不会影响到在容器中的原始结点，感觉与 oracle 机制有点像），最后操作完成。

这样的话，假如一些现有的 reader thread 再次读取该结点时，发现其版本号已经更新了，它仍然会持有指向原来结点的引用，从而实现可重复读取的语义。

如果写操作在等待一定时间后，仍然无法获取 write lock 的话，一个 TimoutException 立即抛出。

尽管 MVCC 已经强行要求写操作必须先获得一个 write lock ，但是众所周知，即使是使用可重复读这一隔离级别，由于 write skew( 写偏斜 ) 所造成的幻读仍然有可能发生。当并发事务进行读写操作时，由于读操作会一直在事务上下文中持有结点的原始引用，那么就算接下的写操作就算已经将该结点给处理掉了，对于读操作来说也是透明的，那么幻读就发生了。

在拷贝结点状态准备进行写操作时，如果检测到 write skew ，那么默认的处理方式就是抛出一个 DataVersioningException 异常。尽管如此，对大多数并不苛刻的应用程序来说， write skew 也许不是什么问题而且出现这样的情况也是允许的。如果你的应用程序并不关心 write skew ，你可以将 writeSkewChecks 属性设置为 false ，完全不予理睬。看看文档，里面有关于 Jboss Cache 的配置细节。

需要注意的是，如果设置了 READ_COMMITTED 隔离级别时，线程总是会处理已经提交的结点，那么 write skews 就可以避免发生；当使用乐观锁时，无论使用何种隔离级别， write skews 都有抛出 DataVersioningException 异常的可能。

有什么可参考的 tutorial 吗？

当然。你们只要下载 JBoss Cache 的 bosscache-core-all.zip 分发包，里面就有图文并茂的 tutorial 来帮你实现自己的缓存。这里还有一个 GUI的 deom来更好的说明缓存。

[http://konghaibo.javaeye.com/blog/308490]

分享到：

Web-DispatcherServletUrlPatterns | 缩写

2009-01-13 09:15
浏览 742
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论