A Look Inside JBoss Cache
So I was asked a number of times as to why I didn’t put in any shameless plugs for JBoss Cache - the project I lead - when I wrote my last article at DZone on distributed caching and parallelism, and here is why.
This follow-up article focuses entirely on the brand-spanking-new JBoss Cache 3.0.0 - code-named Naga, and goes in depth to discuss how our high-performance concurrency model works.
Life before Naga
Before I start talking about Naga specifically, I’d like to delve into a brief history of JBoss Cache. Around 2002, the JBoss Application Server (JBoss AS) needed a clustered cache solution (see my previous article for definition of terms) for its HTTP and EJB session state replication, to maintain high availability in a cluster. JGroups - an open source group communication suite - shipped with a demo replicated hash map. Bela Ban, the founder and current maintainer of JGroups, expanded it to accommodate a tree structure - where data is organized into nodes into a tree data structure - and other cache-relevant features such as eviction and JTA transactions. Around early 2003, this was moved into JBoss AS’s CVS repository and became a part of the JBoss AS code base.
Around March 2005, JBoss Cache was extracted from the JBoss AS’s repository and became it’s own standalone project. The rest, as they say, became history. Features such as cache loading and various cache loader implementations, eviction policies, and buddy replication were gradually added. TCP-based delegating cache servers allowed you to build tiers of caches. Custom marshalling frameworks provided a highly performant alternative to Java serialization when replicating state. Along the way, the cache had through one more major release - 2.0 - which involved a major API change and baselining on Java 5. Two other editions - a POJO edition and Searchable edition - have evolved as well, building on the core cache to provide specific features.
Handing over the torch
Now, it is time for the 2.x series to hand over the torch to 3.0.0. Naga, as it is called internally, is well deserving of a major version change. In addition to evolutionary changes and improvements in the code base - better resource management, marshalling, overall performance enhancements, and a brand new and much simplified configuration file format - it also contains at least one revolutionary change:
MVCC has landed
Multi-versioned concurrency control - MVCC - was adopted as the default concurrency scheme in the cache.
When run in local mode, the most costly part of the cache in terms of memory and CPU cycles is the locking involved in maintaining integrity of shared data. In a clustered mode, this locking is the second most expensive thing, after RPC calls made by the cache instance to remote instances.
Legacy locking schemes
In JBoss Cache 1.x and 2.x, we offered two different locking schemes - an optimistic one and a pessimistic one. Each had their pros and cons, but in the end they were both still costly in terms of performance.
The pessimistic scheme used a lock per tree node. Reader threads would obtain non-exclusive read locks and writer threads would obtain exclusive write locks on these nodes before performing any operations. The locks we used were a custom extension of JDK ReentrantReadWriteLocks, modified to support lock upgrading where within the scope of a transaction, a thread may start off reading a node and then later attempt to write to it.
Overall, this scheme was simple and robust, but didn’t perform too well due to the memory overhead of maintaining a lock per node. More importantly, there was the reduced concurrency since the existence of read locks prevented write locks from being obtained. The effect of readers blocking writers also introduced the possibility of deadlocks. Take, for example, transaction A which performs a read on node /X and a write on node /Y before committing. Transaction B, which performs a read on node /Y and a write on node /X before committing, starts at the same time. With some unfortunate timing, we could end up with a situation where transaction A has a read lock on /X and and is waiting for the write lock on /Y. Transaction B has a read lock on /Y and is waiting on a write lock on /X. Both transactions would deadlock, until one of them times out and rolls back. And typically, once one transaction has timed out, chances are so would the other since they would both have been waiting for almost the same amount of time.
To overcome the deadlock potential, we offered an optimistic locking scheme. Optimistic locking used data versioning on each node. It copied any nodes accessed into a transaction workspace, and allowed transactions to work off the copy. Nodes copied for reading provided repeatable read semantics while nodes copied for writing allowed writer threads to proceed regardless of simultaneous readers. Modified nodes were then merged back to the main tree at transaction commit time, subject to version checking to ensure no concurrent writes took place.
Optimistic locking offered a much higher degree of concurrency with concurrent readers and writers, and removed the risk of deadlock. But it had two main drawbacks. One is performance, since the constant copying of state for each concurrent thread increased memory footprint significantly, and was also costly in terms of CPU cycles. The other is that concurrent writers could exist, but one would inevitably fail at commit time when a data version check fails. So this meant that writer transactions could happily go ahead and do a lot of costly processing and writing, but only fail at the very end when attempting to commit.
So how does MVCC help?
MVCC offers non-blocking readers where readers do not block writer threads, providing a high degree of concurrency as well as removing the risk of deadlock. It also is fail-fast, in that writers work sequentially and don’t overlap, and if they do time out in acquiring a write lock, it happens very early on in the transaction, when a write occurs rather than when a transaction commits. Finally, MVCC is also memory efficient in that it only maintains 1 copy of state for all readers, and 1 version being modified for the single, sequential writer. Even better, our implementation of MVCC uses no locks at all for reader threads (very significant for a read-heavy system like a cache), and a custom exclusive lock implementation for writers. This custom lock is completely free of synchronized blocks and uses modern techniques like compare-and-swap and memory fencing using volatile variables to achieve synchronization. All this leads to a very highly performant and scalable locking scheme.
Let’s look at some details here.
The extremely high performance of JBoss Cache's MVCC implementation for reading threads is achieved by not requiring any synchronization or locking for readers. For each reader thread, the cache wraps state in a lightweight container object, which is placed in a container local to the thread (a ThreadLocal) or an ongoing transaction. All subsequent operations made on the cache with regards to this state happens via this container object. This use of Java references allows for repeatable read semantics even if the actual state changes concurrently.
Writer threads, on the other hand, need to acquire a lock before any writing can commence. We now use lock striping to improve the memory performance of the cache, and the size of the shared lock pool can be tuned using the concurrencyLevel attribute of the locking element. (See the JBoss Cache configuration reference for details ). After acquiring an exclusive lock on a node, the writer thread then wraps the state to be modified in a container as well, just like with reader threads, and then copies this state for writing. When copying, a reference to the original version is still maintained in the container for rollbacks. Changes are then made to the copy and the copy is finally written to the data structure when the write completes.
This way, subsequent readers see the new version while existing readers still hold a reference to the original version in their context, achieving repeatable read semantics.
If a writer is unable to acquire the write lock after some time, a TimeoutException is thrown.
Although MVCC forces writers to obtain a write lock, a phenomenon known as write skews may occur when using repeatable read as your isolation level. This happens when concurrent transactions perform a read and then a write, based on the value that was read. Since reads involve holding on to the reference to the state in the transaction context, a subsequent write would work off that original state read, which may now be stale.
The default behavior with dealing with a write skew is to throw a DataVersioningException, when it is detected when copying state for writing. However, in most applications, a write skew may not be an issue (for example, if the state written has no relationship to the state originally read) and should be allowed. If your application does not care about write skews, you can allow them to happen by setting the writeSkewCheck configuration attribute to false. See the JBoss Cache configuration reference for details.
Note that write skews cannot happen when using READ_COMMITTED since threads always work off committed state. Write skews are also witnessed in optimistic locking, manifested as a DataVersioningException, except that this can happen with any isolation level when using optimistic locking.
Is there a tutorial on this stuff?
Of course. All you need to do is download the jbosscache-core-all.zip distribution of JBoss Cache. A tutorial is bundled in the distribution, complete with a GUI to visualize what’s going on in your cache as you do stuff. Alternately, there is also a GUI demo to demonstrate cache capabilities.
Nice - so where do I get it?
So to sum things up, Naga is the latest and greatest of what JBoss Cache has to offer, with significantly faster access for both readers and writers, much better stability and predictability in performance, faster replication, lower memory footprint, makes you coffee, and walks your dog. Download Naga here [6]. A users’ guide, FAQ and tutorial is also available here.
[http://java.dzone.com/articles/a-look-inside-jboss-cache]
作者 Manik Surtani 是开源项目 JBoss Cache 的领导人,本文主要是对 JBoss Cache3.0 —— Naga 一些全新技术的论述,其中也不乏对原有技术的回顾。总的来说,本文还是揭示了缓存的未来—— MVCC ,值得推荐。
正文
当我在 DZone 写完《 分布式,缓存与并行 》一文后,许人多次我为什么这么“厚着脸皮”的力挺我领导的 JBoss Cache 开源项目时,我想本文会给你们一个满意的答案。这篇文章将完全致力于 焕然一新的 JBoss Cache3.0 (代号 Naga ),并且深入讨论我们的高性能并发模型。
Naga 的今生前世
在开始讨论 Naga 之前,我还是想简要的说说 JBoss Cache 的历史。大概在 2002 的时候,为了维持集群的高可用性,
JBoss AS ( JBoss 应用服务器)需要提供一个专门为解决 HTTP 和 EJB Session 状态复制的集群缓存方案 。
JGroups 是一款开源的成组通信 (group communication ) 项目。 Bela Ban 是 JGroup
的创始人也是维护开发人员,对 JGroup 进行扩展,使其适应树形数据结构 ,并且还增加了一个缓存相关的特性: eviction 和 JTA
Transactions 。大约在 2003 年年初时,这个被扩展的树型结构迁移到了 JBoss AS CVS 的 repository
中,从此成为 Jboss AS 中的一员。
时间齿轮又指向了 2005 年 3 月, JBoss Cache 从 JBoss AS 的 repository
中分离出来,单独形成一个项目。唉,那都是陈年旧事了。不过,像 cache loading( 缓存负载 ) ,多种缓存装载器的实现,
eviction 策略, buddy replication(buddy 复制 ) 都是后来慢慢加入的。基于 TCP
的委托缓存服务器允许你构建多层缓存。当进行状态提制时, Custom marshalling 框架为比 Java
序列化机制提供了更高的性能。紧接着又迎来了 JBoss Cache2.0 的发布,这次的 API 改动很大,并且要求基于 Java5
。另外两个基于此核心缓存技术的 POJO版本 和 Searchable版本 也发展良好。
火炬交接
现在,是将 Jboss Cache2.* 系列的火炬转交给 Naga 了。这次 Naga 又有了很大的变化和改进, 除了资源管理和 marshalling 的全面提升,以及全新的简化配置文件格式外,它还包括至少一个革命性的改变: MVCC 。
MVCC 时代已经到来
MVCC 全称 Multi-versioned concurrency control( 多版本并发控制 ) ,在 Naga 中已经被采纳作为默认的并发解决方案。
当以本地方式运行时,对内存和 CPU 而言,缓存最大的开销就是使用锁来在保证共享数据完整性。而到了集群环境中,锁成了继 RPC 调用后的第二大开销“大户”。
对遗留的锁机制回顾
在 JBoss Cache1.* 和 2.* 时代,我们提供两种不同的锁方案——即乐观锁和悲观锁。它们各有千秋,但是从性能角度上来说,它们还是开销太大。
悲观锁用来锁住树中的每个结点。 Reader threads (读线程)可以得到一个非独占的 read locks( 读锁 ) ,而
writer threads( 写线程 ) 却可得到一个独占的 write locks( 写锁 ) ,从而独占这些结点。我们实现的锁是通过扩展
JDK 的 ReentrantReadWriteLocks ,将其改进成为支持事务作用域内的锁更新——即一个线程可以开始时用 read
locks 去读取一个结点,稍后再尝试用 write locks 着去更新它。(注意,悲观锁的读写是互斥的,无法同时进行的 )
总得来说,这种方案简单而且健壮,但由于内存要维护每个被锁的结点,所以从性能上说还不是很满意。更重要的是,如果结点已经被 read
locks 锁住了,那么 write locks
就没办法再去操作它们了,使得并发性能下降。读操作阻塞写操作的后果还容易造成死锁。好吧,我们现在来看个例子:
“事务 A 提交前,对结点 /X 执行读操作,对结点 /Y 执行写操作。事务 B 恰恰与之相反,在提交前,对结点 /Y
执行读操作,对结点 /X 执行写操作。不幸的事发生了,事务 A 对结点 /X 用了 read lock ,并且还在等待时机去用 write
lock 操作 /Y 结点;而事务 B 对结点 /Y 用了 read lock ,也还在等待时机去用 write lock 操作 /X
结点。这两个事务发生死锁了,直到其中的一方超时,然后事务回滚。”
为了克服潜在的死锁问题,我们提供了乐观锁。乐观锁对每个结点采用版本控制方式。它允许任意多个结点拷贝 (Nodes copied)
出现在一个事务中,并允许事务处理这些拷贝。结点拷贝为读操作提供了可重复读取的语义,同时还允许 writer threads
在不考虑读操作的情况下,进行相应的写操作。那些修改的结点会在事务提交时进行版本检查,确保没有新的并发写操作发生,最后将结点合并到缓存的
main tree 上去。
乐观锁提供了更高级别的并发机制来处理并发读写操作,而且还避免了死锁的风险。但它仍然有两个主要的缺点:一是性能问题。因为不断的将结点的状态
拷贝到每个并发线程所造成的内存和 CPU
开销是不容忽略的。二是尽管并发时允许了写操作,但是一旦发现数据的版本不对,事务提交时不可避免的还是会失败。也就是说,此时写事务虽然可以不受限制的
进行大量处理和写操作,但是这样在事务结束的时候容易出现提交失败。
MVCC 有什么用
MVCC 提供了非阻塞 (non-blocking) 读操作 ( 它并不会去阻塞 wirter threads)
,在避免死锁的同时也提供了更高级的并发机制。它采用了 fail-fast 机制,如果写操作得到了一个 write lock
,那么它们也是依次进行,不允许重叠。最后我要说的是, MVCC
在内存使用率上也是可圈可点:它对所有的读操作只维护一个状态的拷贝;对依次顺序进行的写操作来说,每次的修改只会对版本号产生一次变化。更棒的是,我们
的 MVCC 实现甚至可以对 reader threads 完全不采用任何锁 ( 对于像缓存这样频繁读取的系统来说,意义太大了 )
,并且还允许自定义的为写操作实现独占锁。自定义锁完全摒弃了同步代码块,使用了最新的并发技术: compare-and-swap 和
memory fencing( 使用 volatile variables 实现同步 ) 。所有的这一切都会让 MVCC
在性能和可伸缩性方面,成为一个更加出色的解决方案。
说了这么多,是该谈 MVCC 的细节了。
JBoss Cache 的 MVCC 实现这所以这么高效在于 reading threads
之间不需要任何同步代码块或锁机制。对于每个 reader thread 来说,缓存将结点的状态包装在一个轻量级的容器对象(比如说
ThreadLocal
)或者长事务中。所有的后继操作要想访问或操作缓存中的结点,都必须通过这个容器对象。甚至当结点的状态真的在并发时发生了变化,那么使用 Java
引用的使用也可以达到可重复读取的语义。(下文有具体的说明 )
从另一方面来看, write threads 首先需要获得一个锁后,才可执行写操作。现在,我们的做法是使用 lock striping
(分离锁)来提升缓存的内存性能,而 shared lock pool( 共享锁池 ) 级别可以使用被锁定结点的
concurrencyLevel 属性来进行调整 ( 更多细节,请看 Jboss Cache的配置参考
) 。在获得一个独占锁后,如同 reader threads 那样, writer thread
也会将要修改结点的状态包装在一个容器中,然后将它的状态拷贝出来,再进行写操作。注意,在拷贝状态的时候,指向原始结点的引用仍然是可以进行回滚操作
的。当写操作即将完成时, writer thread
最终又将已经发生改变的拷贝的状态写回相应的数据结构中(比如说文件系统,数据库等,但是始终不会影响到在容器中的原始结点,感觉与 oracle
机制有点像) ,最后操作完成。
这样的话,假如一些现有的 reader thread 再次读取该结点时,发现其版本号已经更新了,它仍然会持有指向原来结点的引用,从而实现可重复读取的语义。
如果写操作在等待一定时间后,仍然无法获取 write lock 的话,一个 TimoutException 立即抛出。
尽管 MVCC 已经强行要求写操作必须先获得一个 write lock ,但是众所周知,即使是使用可重复读这一隔离级别,由于
write skew( 写偏斜 )
所造成的幻读仍然有可能发生。当并发事务进行读写操作时,由于读操作会一直在事务上下文中持有结点的原始引用,那么就算接下的写操作就算已经将该结点给处
理掉了,对于读操作来说也是透明的,那么幻读就发生了。
在拷贝结点状态准备进行写操作时,如果检测到 write skew ,那么默认的处理方式就是抛出一个
DataVersioningException 异常。尽管如此,对大多数并不苛刻的应用程序来说, write skew
也许不是什么问题而且出现这样的情况也是允许的。如果你的应用程序并不关心 write skew ,你可以将 writeSkewChecks
属性设置为 false ,完全不予理睬。看看文档,里面有关于 Jboss Cache 的配置细节。
需要注意的是,如果设置了 READ_COMMITTED 隔离级别时,线程总是会处理已经提交的结点,那么 write skews
就可以避免发生;当使用乐观锁时,无论使用何种隔离级别, write skews 都有抛出 DataVersioningException
异常的可能。
有什么可参考的 tutorial 吗?
当然。你们只要下载 JBoss Cache 的 bosscache-core-all.zip 分发包,里面就有图文并茂的 tutorial 来帮你实现自己的缓存。 这里还有一个 GUI的 deom来更好的说明缓存 。
[http://konghaibo.javaeye.com/blog/308490]
相关推荐
JBoss Cache是一款功能完备的企业级开源缓存引擎,它具备事务性、分布式/复制、持久性、自愈性、可插拔/可扩展性以及高度可配置/可调性等特点。JBoss Cache最早以复制java.util.HashMap的演示项目开始,使用了LGPL...
标题中的“用JBOSS CACHE做CAS集群在weblogic上部署的问题”涉及到的是一个特定的分布式缓存解决方案在不同应用服务器环境下的集成问题。CAS(Central Authentication Service)是一种开源的身份验证服务,它允许...
【JBoss Cache 作为 POJO Cache】 JBoss Cache 是一款强大的开源缓存解决方案,尤其适合在企业级应用中作为POJO(Plain Old Java Object)的缓存系统使用。POJO Cache 提供了一种面向对象且分布式的缓存机制,使得...
### Hibernate与Jboss Cache结合实现二级缓存机制 #### 概览 在现代的企业级应用开发中,提高数据访问效率是提升系统性能的关键之一。Hibernate作为一款流行的Java持久层框架,通过提供对象关系映射(ORM)技术...
### JBoss Cache 相关知识点 #### 一、概述与定义 **JBoss Cache** 是一个功能强大的集群式缓存解决方案,它支持多种缓存模式并具有高度的灵活性。JBoss Cache 主要用于提高应用程序的性能,通过在内存中存储经常...
jboss-cache.jar jboss-cache.jar
### JBoss Cache详解 #### 一、JBoss Cache概述 JBoss Cache是一种高效的企业级缓存解决方案,专门设计用于提高Java应用程序的性能和可用性。它不仅提供了强大的数据缓存能力,还支持集群环境下的数据一致性,使得...
jbosscache-core-3.2.5.GA-all.zip 是一个包含JBoss Cache核心组件的压缩包,这个版本是3.2.5.GA,主要用于提供高性能、分布式的缓存解决方案。JBoss Cache,也被称为Infinispan前身的一部分,是JBoss企业中间件的一...
**JBoss POJO Cache** 是一个基于 **JBoss AOP Framework** 实现的缓存解决方案,主要用于优化数据存储和检索的效率。它扩展了 **Core Cache** 功能,并且特别针对POJO(Plain Old Java Object)对象进行了优化,...
JavaEE源代码 jboss-cacheJavaEE源代码 jboss-cacheJavaEE源代码 jboss-cacheJavaEE源代码 jboss-cacheJavaEE源代码 jboss-cacheJavaEE源代码 jboss-cacheJavaEE源代码 jboss-cacheJavaEE源代码 jboss-cacheJavaEE源...
其中jbosscache-core-sources.jar为源代码jar包;jbosscache-core.jar含jbosscache的核心代码; lib中含有jbosscache-core.jar中所用到得class文件; etc中含有一些配置文件的样例
3. **持久化(persisted to disk and/or a remote, in-memory cache cluster)**:Jboss Cache支持将数据持久化到磁盘,并可与远程内存缓存集群(far-cache)交互,以防止数据丢失。 4. **垃圾回收与状态保存...
Jboss cache.pdf介绍了如何配置和使用JBoss Cache,包括集群设置、缓存策略等内容。 **6. JFreeChart** JFreeChart 是一个Java库,用于创建各种图表,如折线图、饼图、柱状图等。JFreeChart-1.0.5-Ch.chm提供了...
为工作流集成了jBPM,为业务规则集成了JBoss规则,为电子邮件集成了Meldware Mail,为完整的文本搜索集成了Hibernate Search和Lucene,为消息集成了JMS,以及为页面片断捕捉集成了JBoss Cache。 Seam在JAAS和JBoss...
红帽公司jboss的内部资料,版本 2.2.0 Poblano版,需要的可以下载!
2. **jboss-cache.jar****:** 这个jar文件包含了JBoss的分布式缓存实现,名为Infinispan。Infinispan是一个高性能、可扩展的数据网格,提供数据存储和缓存功能。在JBoss应用服务器中,Infinispan可以作为全局的、...
【JBOSS,JBoss安装部署】 JBoss是Red Hat公司开发的一款开源的应用服务器,它基于Java EE(Enterprise Edition)规范,提供了全面的企业级应用程序部署和管理解决方案。本篇文章将详细讲解JBoss的安装和部署过程,...
### JBoss Cache 相关知识点 #### 一、概述 JBoss Cache 是一款支持复制与事务功能的缓存系统。其主要特点在于能够通过多个 JBoss Cache 实例进行数据分发,这些实例可以位于同一 JVM 内或跨多个 JVM(无论它们...