lucene has updated to 3.3

leibnitz

浏览: 288860 次
性别:
来自: 广州

最近访客更多访客>>

eternal1025

bneliao

adapterofcoms

caipeijun666

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

lucene

lucene

3.3－2011.7

Highlights of the Lucene release include:

The spellchecker module now includes suggest/auto-complete functionality, with three implementations: Jaspell, Ternary Trie, and Finite State.
Support for merging results from multiple shards , for both "normal" search results (TopDocs.merge) as well as grouped results using the grouping module (SearchGroup.merge, TopGroups.merge).
An optimized implementation of KStem, a less aggressive stemmer for English
Single-pass grouping implementation based on block document indexing.
Improvements to MMapDirectory (now also the default implementation returned by FSDirectory.open on 64-bit Linux ).
NRTManager simplifies handling near-real-time search with multiple search threads, allowing the application to control which indexing changes must be visible to which search requests.
TwoPhaseCommitTool facilitates performing a multi-resource two-phased commit, including IndexWriter.
The default merge policy, TieredMergePolicy, has a new method (set/getReclaimDeletesWeight) to control how aggressively it targets segments with deletions, and is now more aggressive than before by default.
PKIndexSplitter tool splits an index by a mid-point term.

3.2－2011-6

A new grouping module, under lucene/contrib/grouping, enables search results to be grouped by a single-valued indexed field 原来这版本才出来
A new IndexUpgrader tool fully converts an old index to the current format.
A new Directory implementation, NRTCachingDirectory , caches small segments in RAM, to reduce the I/O load for applications with fast NRT reopen rates.
A new Collector implementation, CachingCollector , is able to gather search hits (document IDs and optionally also scores) and then replay them. This is useful for Collectors that require two or more passes to produce results.
Index a document block using IndexWriter's new addDocuments or updateDocuments methods. These experimental APIs ensure that the block of documents will forever remain contiguous in the index, enabling interesting future features like grouping and joins.
A new default merge policy, TieredMergePolicy , which is more efficient due to being able to merge non-contiguous(邻近的，连续） segments. See http://s.apache.org/merging for details.
NumericField is now returned correctly when you load a stored document (previously you received a normal Field back, with the numeric value converted string).
Deleted terms are now applied during flushing to the newly flushed segment, which is more efficient than having to later initialize a reader for that segment.

3.1－2011.3

ConcurrentMergeScheduler is more careful about setting priority of merge threads.

ReusableAnalyzerBase makes it easier to reuse TokenStreams correctly.

ConstantScoreQuery now allows directly wrapping a Query.

IndexWriter is now configured with a new separate builder API, IndexWriterConfig. You can now control IndexWriter's previously fixed internal thread limit by calling setMaxThreadStates.

IndexWriter.getReader is replaced by IndexReader .open(IndexWriter)

MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher.

New TotalHitCountCollector just counts total number of hits.
ReaderFinishedListener API enables external caches to evict entries once a segment is finished.

据说是已经实现了grouping，但还是没说出来。。。

3.0.3－2010-12

a memory leak in IndexWriter exacerbated by frequent commits

这也说明还不是很稳定

fixed:NumericRangeQuery / NumericRangeFilter sometimes returning incorrect results with bounds near Long.MIN_VALUE and Long.MAX_VALUE

various thread safety issues

3.0.2－2010-6

Fixed memory leaks in IndexWriter when large documents are indexed. It also uses now shared memory pools for term vectors and stored fields. IndexWriter now releases Fieldable s and Reader s on close .

Performance improvements in ParallelMultiSearcher (3.0.2 only).

注意：2.x与3.x系列只是编译版本不同而已，前者是jdk1.4,后才是5.0

最近的版本更新得频繁，感觉稳定性上不是那么有信心。。

分享到：

nutch 搜索site dedup | nutch 搜索流程 2-distributed search

2011-07-20 13:56
浏览 999
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论