`

lucene has updated to 3.3

阅读更多

3.3-2011.7

Highlights of the Lucene release include:

  • The spellchecker module now includes suggest/auto-complete functionality, with three implementations: Jaspell, Ternary Trie, and Finite State.
  • Support for merging results from multiple shards , for both "normal" search results (TopDocs.merge) as well as grouped results using the grouping module (SearchGroup.merge, TopGroups.merge).
  • An optimized implementation of KStem, a less aggressive stemmer for English
  • Single-pass grouping implementation based on block document indexing.
  • Improvements to MMapDirectory (now also the default implementation returned by FSDirectory.open on 64-bit Linux ).
  • NRTManager simplifies handling near-real-time search with multiple search threads, allowing the application to control which indexing changes must be visible to which search requests.
  • TwoPhaseCommitTool facilitates performing a multi-resource two-phased commit, including IndexWriter.
  • The default merge policy, TieredMergePolicy, has a new method (set/getReclaimDeletesWeight) to control how aggressively it targets segments with deletions, and is now more aggressive than before by default.
  • PKIndexSplitter tool splits an index by a mid-point term.

 

3.2-2011-6

  • A new grouping module, under lucene/contrib/grouping, enables search results to be grouped by a single-valued indexed field 原来这版本才出来
  • A new IndexUpgrader tool fully converts an old index to the current format.
  • A new Directory implementation, NRTCachingDirectory , caches small segments in RAM, to reduce the I/O load for applications with fast NRT reopen rates.
  • A new Collector implementation, CachingCollector , is able to gather search hits (document IDs and optionally also scores) and then replay them. This is useful for Collectors that require two or more passes to produce results.
  • Index a document block using IndexWriter's new addDocuments or updateDocuments methods. These experimental APIs ensure that the block of documents will forever remain contiguous in the index, enabling interesting future features like grouping and joins.
  • A new default merge policy, TieredMergePolicy , which is more efficient due to being able to merge non-contiguous(邻近的,连续) segments. See http://s.apache.org/merging for details.
  • NumericField is now returned correctly when you load a stored document (previously you received a normal Field back, with the numeric value converted string).
  • Deleted terms are now applied during flushing to the newly flushed segment, which is more efficient than having to later initialize a reader for that segment.

 

 

3.1-2011.3

ConcurrentMergeScheduler is more careful about setting priority of merge threads.

 

ReusableAnalyzerBase makes it easier to reuse TokenStreams correctly.

 

ConstantScoreQuery now allows directly wrapping a Query.

 

IndexWriter is now configured with a new separate builder API, IndexWriterConfig. You can now control IndexWriter's previously fixed internal thread limit by calling setMaxThreadStates.

 

IndexWriter.getReader is replaced by IndexReader .open(IndexWriter)

 

MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher.

 

  • New TotalHitCountCollector just counts total number of hits.
  • ReaderFinishedListener API enables external caches to evict entries once a segment is finished.

据说是已经实现了grouping,但还是没说出来。。。

 

 

3.0.3-2010-12

a memory leak in IndexWriter exacerbated by frequent commits

这也说明还不是很稳定

 

fixed:NumericRangeQuery / NumericRangeFilter sometimes returning incorrect results with bounds near Long.MIN_VALUE and Long.MAX_VALUE

 

various thread safety issues

 

 

3.0.2-2010-6

Fixed memory leaks in IndexWriter when large documents are indexed. It also uses now shared memory pools for term vectors and stored fields. IndexWriter now releases Fieldable s and Reader s on close .

 

Performance improvements in ParallelMultiSearcher (3.0.2 only).

 

注意:2.x与3.x系列只是编译版本不同而已,前者是jdk1.4,后才是5.0

最近的版本更新得频繁,感觉稳定性上不是那么有信心。。

 

 

分享到:
评论

相关推荐

    lucene最新版本3.3的基本功能用法(IK分词是3.2.8)

    《Lucene 3.3基础功能与IK分词器3.2.8使用详解》 Apache Lucene是一款高性能、全文本检索库,被广泛应用于各种搜索引擎的开发中。本文将重点探讨Lucene 3.3版本的基础功能及其与IK分词器3.2.8的集成使用方法。 一...

    lucene3.3的全部jar包

    lucene3.3的全部jar包ant-1.7.1.jar ant-junit-1.7.1.jar commons-beanutils-1.7.0.jar commons-collections-3.1.jar commons-compress-1.1.jar commons-digester-1.7.jar commons-logging-1.0.4.jar icu4j-4_8.jar ...

    lucene3.3全部jar包

    在"lucene3.3+全部jar包"这个压缩文件中,通常包含以下 JAR 文件: - `lucene-core-3.3.0.jar`:Lucene 的核心库,包含了索引和搜索的基本功能。 - `lucene-analyzers-3.3.0.jar`:提供了多种语言的文本分析器,...

    Lucene3.3_API文档

    本篇将详细解析Lucene 3.3 API,帮助开发者深入理解并有效地利用这个强大的搜索引擎库。 ### 1. Lucene的核心组件 Lucene的核心组件包括索引(Indexing)、查询(Querying)和搜索(Searching)等部分。 - **索引...

    lucene 3.3 core的源码包 lucence3.3_src.zip

    lucene 3.3 core的源码包 lucence3.3_src.zip 使用eclipse展开引用的core核心包 然后随便点进去一个方法里click attatch source然后选择这个zip包,即可/

    支持Lucene3.3、3.4的庖丁解牛分词法的源码和jar包

    资源为庖丁解牛分词法的最新源码以及生成的jar包,支持最新的Lucene3.4以及Lucene3.0以上版本。Jar包为本地生成,大家也可以到SVN上检出自己生成,另外庖丁解牛分词法的使用Demo我会接下来上传一份,欢迎分享。

    lucene,lucene教程,lucene讲解

    lucene,lucene教程,lucene讲解。 为了对文档进行索引,Lucene 提供了五个基础的类 public class IndexWriter org.apache.lucene.index.IndexWriter public abstract class Directory org.apache.lucene.store....

    lucene-analyzers-common-5.1.0.jar

    * Snowball BSD license header has been added to the Java classes to avoid having RAT adding new ASL headers. IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY! An index created using the Snowball module...

    lucene3.0 lucene3.0

    lucene3.0 lucene3.0 lucene3.0 lucene3.0 lucene3.0

    lucene7.3常用jar包

    First, you should download the latest Lucene distribution and then extract it to a working directory. You need four JARs: the Lucene JAR, the queryparser JAR, the common analysis JAR, and the Lucene ...

    lucene-4.7.0全套jar包

    【Lucene 4.7.0 全套JAR包详解】 Lucene是一个开源全文搜索引擎库,由Apache软件基金会开发并维护。它提供了一个高级、灵活的文本搜索API,允许开发者轻松地在应用程序中实现复杂的搜索功能。这次提供的“lucene-...

    Lucene3.5源码jar包

    本压缩包包含的是Lucene 3.5.0版本的全部源码,对于想要深入理解Lucene工作原理、进行二次开发或者进行搜索引擎相关研究的开发者来说,是一份非常宝贵的学习资源。 Lucene 3.5.0是Lucene的一个重要版本,它在3.x...

    lucene in action英文版 lucene 3.30包

    《Lucene in Action》是关于Apache Lucene的权威指南,这本书深入浅出地介绍了全文搜索引擎的构建和优化。Lucene是一个高性能、全文本搜索库,它允许开发人员在应用程序中轻松实现复杂的搜索功能。这本书主要面向...

    Lucene时间区间搜索

    Lucene是一款强大的全文搜索引擎库,广泛应用于各种数据检索场景。在C#环境下,利用Lucene进行时间区间搜索是提高数据检索效率和精确度的重要手段。本篇将深入探讨如何在C#中实现Lucene的时间区间查询匹配,以及涉及...

    Annotated Lucene 中文版 Lucene源码剖析

    《Annotated Lucene 中文版 Lucene源码剖析》是一本深入探讨Apache Lucene的书籍,专注于源码解析,帮助读者理解这个强大的全文搜索引擎库的工作原理。Lucene是一款开源的Java库,它提供了高效的文本搜索功能,被...

    lucene-codecs-4.4.0.zip

    《深入理解Lucene 4.4.0代码库与Java核心技术》 在IT领域,Lucene是一个非常重要的开源全文搜索引擎库,它为开发者提供了强大的文本分析、索引和搜索功能。这里我们关注的是Lucene的4.4.0版本,通过解压"lucene-...

    Lucene简介.介绍

    【Lucene 简介】 Lucene 是一个强大的开源全文搜索库,由 Java 编写,主要用于为应用程序添加全文检索功能。它不是一个完整的全文搜索引擎应用,而是一个工具包,允许开发者将其集成到自己的软件中,以实现高效、...

    Lucene示例 BM25相似度计算

    在IT领域,搜索引擎技术是至关重要的,而Lucene作为一个开源全文搜索引擎库,广泛应用于各种文本检索系统中。本文将深入探讨Lucene示例中的BM25相似度计算,旨在帮助初学者理解如何利用Lucene 4.7.1版本构建索引、...

    lucene3源码分析

    ### Lucene3源码分析知识点概述 #### 一、全文检索的基本原理 ##### 1. 总论 全文检索系统是一种高效的信息检索技术,能够帮助用户在海量文档中快速找到包含特定关键词的信息。Lucene是Java领域内最受欢迎的全文...

    lucene 2.0 api以及lucene 3.0 api

    **Lucene 2.0 API 和 Lucene 3.0 API 深度解析** Lucene 是一个由 Apache 软件基金会开发的全文搜索引擎库,它为开发者提供了在 Java 应用程序中实现高性能、可扩展的全文搜索功能的能力。Lucene 的 API 设计得相当...

Global site tag (gtag.js) - Google Analytics