Solr scalability improvements

wutao8818

浏览: 624842 次
性别:
来自: 杭州

最近访客更多访客>>

KevinTeng

malson

rapin

shi007

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Solr lucene WordPress Cache Windows

http://yonik.wordpress.com/

With CPU cores constantly increasing, there has been some major work done in Lucene/Solr to increase the scalability under multi-threaded load.
Read-only IndexReaders

One bottleneck was synchronization around the checking of deleted docs in a Lucene IndexReader. Since another thread could delete a document at any time, the IndexReader.isDeleted() call was synchronized. It’s a very quick call, simply checking if a bit is set in a BitVector, but the problem was that it can be called millions of times in the process of satisfying a single query. The Read-only IndexReader feature allowed for the removal of this synchronization by prohibiting deletion.
Use of NIO to read index files

The standard method for Lucene to read index files is via Java’s RandomAccessFile. Reading a part of the file involves two calls, a seek() to position the file pointer followed by a read() to get the data. For multiple threads to share the same RandomAccessFile instance, this obviously involves synchronization to avoid one thread changing the file pointer before another thread gets to read at the file position it set. If the data to be read isn’t in the operating system cache, it’s even worse news… the synchronization causes all other reads to block while the data is retrieved from disk, even if some of those reads could have been quickly satisified.

The preferred solution would be to have a method on RandomAccessFile that accepted an offset to read from. This could easily be implemented by the JVM via a pread() system call. But since Sun has not provided this functionality, we need to use something else. NIO’s FileChannel does have the type of method we are looking for: FileChannel.read(ByteBuffer dst, long position)

Solr now uses the non-synchronizing NIO method of reading index files (via Lucene’s NIOFSDirectory) by default if you are on a non-Windows platform. Windows systems default to the older method since it turns out to be faster than the new method - the reason being a long standing “bug” in Java that still synchronizes internally even when using FileChannel.read().
Non blocking caches

Solr’s standard LRU cache implementation use a synchronized LinkedHashMap. A single cache could be checked hundreds or thousands of times during the course of a single request that involves faceting. A non-blocking ConcurrentLRUCache was developed as an alternative implementation, and is now the default for Solr’s filter cache. One user indicated that this has doubled their query throughput under ideal circumstances.
Where to find this scalability goodness?

Solr 1.3 has read-only IndexReaders, but for the other scalability improvements, including the improved faceting, you’ll have to grab a nightly Solr build.

分享到：

URL openStream | SPRING_SECURITY session key

2009-04-13 10:28
浏览 1393
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论