Highlights of the Lucene release include:
- The spellchecker
module now includes suggest/auto-complete functionality,
with three implementations: Jaspell, Ternary Trie, and Finite State.
- Support for merging results from multiple shards
, for both "normal"
search results (TopDocs.merge) as well as grouped results using the
grouping module (SearchGroup.merge, TopGroups.merge).
- An optimized implementation of KStem, a less aggressive stemmer
for English
- Single-pass grouping implementation based on block document indexing.
- Improvements to MMapDirectory (now also the default implementation
returned by FSDirectory.open on 64-bit
simplifies handling near-real-time search with multiple
search threads, allowing the application to control which indexing
changes must be visible to which search requests.
facilitates performing a multi-resource
two-phased commit, including IndexWriter.
- The default merge policy, TieredMergePolicy, has a new method
(set/getReclaimDeletesWeight) to control how aggressively it
targets segments with deletions, and is now more aggressive than
before by default.
tool splits an index by a mid-point term.
- A new grouping
module, under lucene/contrib/grouping, enables
search results to be grouped by a single-valued indexed field 原来这版本才出来
- A new IndexUpgrader
tool fully converts an old index to the
current format.
- A new Directory implementation, NRTCachingDirectory
, caches small
segments in RAM, to reduce the I/O load for applications with fast
NRT reopen rates.
- A new Collector implementation, CachingCollector
, is able to
gather search hits (document IDs and optionally also scores) and
then replay them. This is useful for Collectors that require two
or more passes to produce results.
- Index a document block using IndexWriter's new addDocuments
methods. These experimental APIs ensure that the
block of documents will forever remain contiguous in the index,
enabling interesting future features like grouping and joins.
- A new default merge policy, TieredMergePolicy
, which is more
efficient due to being able to merge non-contiguous(邻近的,连续) segments.
See http://s.apache.org/merging
for details.
- NumericField is now returned correctly when you load a stored
document (previously you received a normal Field back, with the
numeric value converted string).
- Deleted terms are now applied during flushing to the newly flushed
segment, which is more efficient than having to later initialize a
reader for that segment.
is more careful about setting priority of
merge threads.
makes it easier to reuse TokenStreams
now allows directly wrapping a Query.
is now configured with a new separate builder API,
IndexWriterConfig. You can now control IndexWriter's previously
fixed internal thread limit by calling setMaxThreadStates.
IndexWriter.getReader is replaced by IndexReader
MultiSearcher is deprecated; ParallelMultiSearcher
has been
absorbed directly into IndexSearcher.
- New TotalHitCountCollector
just counts total number of hits.
API enables external caches to evict entries
once a segment is finished.
a memory leak in IndexWriter
exacerbated by frequent commits
/ NumericRangeFilter
sometimes returning incorrect results
with bounds near Long.MIN_VALUE
and Long.MAX_VALUE
various thread safety issues
Fixed memory leaks in IndexWriter
when large documents are indexed.
It also uses now shared memory pools for term vectors and stored fields.
now releases Fieldable
s and
s on close
Performance improvements in ParallelMultiSearcher
(3.0.2 only).
