`
- 浏览:
337729 次
- 性别:
- 来自:
北京
-
- Changes in backwards compatibility policy
(1)
- LUCENE-1340
: In a minor change to Lucene's backward compatibility
policy, we are now allowing the Fieldable interface to have
changes, within reason, and made on a case-by-case basis. If an
application implements it's own Fieldable, please be aware of
this. Otherwise, no need to be concerned. This is in effect for
all 2.X releases, starting with 2.4. Also note, that in all
likelihood, Fieldable will be changed in 3.0.
- Changes in runtime behavior
(4)
- LUCENE-1151
: Fix StandardAnalyzer to not mis-identify host names
(eg lucene.apache.org) as an ACRONYM. To get back to the pre-2.4
backwards compatible, but buggy, behavior, you can either call
StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static
method), or, set system property
org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
to "false" on JVM startup. All StandardAnalyzer instances created
after that will then show the pre-2.4 behavior. Alternatively,
you can call setReplaceInvalidAcronym(false) to change the
behavior per instance of StandardAnalyzer. This backwards
compatibility will be removed in 3.0 (hardwiring the value to
true).
(Mike McCandless)
- LUCENE-1044
: IndexWriter with autoCommit=true now commits (such
that a reader can see the changes) far less often than it used to.
Previously, every flush was also a commit. You can always force a
commit by calling IndexWriter.commit(). Furthermore, in 3.0,
autoCommit will be hardwired to false (IndexWriter constructors
that take an autoCommit argument have been deprecated)
(Mike
McCandless)
- LUCENE-1335
: IndexWriter.addIndexes(Directory[]) and
addIndexesNoOptimize no longer allow the same Directory instance
to be passed in more than once. Internally, IndexWriter uses
Directory and segment name to uniquely identify segments, so
adding the same Directory more than once was causing duplicates
which led to problems
(Mike McCandless)
- LUCENE-1396
: Improve PhraseQuery.toString() so that gaps in the
positions are indicated with a ? and multiple terms at the same
position are joined with a |.
(Andrzej Bialecki via Mike
McCandless)
- API Changes
(26)
- LUCENE-1084
: Changed all IndexWriter constructors to take an
explicit parameter for maximum field size. Deprecated all the
pre-existing constructors; these will be removed in release 3.0.
NOTE: these new constructors set autoCommit to false.
(Steven
Rowe via Mike McCandless)
- LUCENE-584
: Changed Filter API to return a DocIdSet instead of a
java.util.BitSet. This allows using more efficient data structures
for Filters and makes them more flexible. This deprecates
Filter.bits(), so all filters that implement this outside
the Lucene code base will need to be adapted. See also the javadocs
of the Filter class.
(Paul Elschot, Michael Busch)
- LUCENE-1044
: Added IndexWriter.commit() which flushes any buffered
adds/deletes and then commits a new segments file so readers will
see the changes. Deprecate IndexWriter.flush() in favor of
IndexWriter.commit().
(Mike McCandless)
- LUCENE-325
: Added IndexWriter.expungeDeletes methods, which
consult the MergePolicy to find merges necessary to merge away all
deletes from the index. This should be a somewhat lower cost
operation than optimize.
(John Wang via Mike McCandless)
- LUCENE-1233
: Return empty array instead of null when no fields
match the specified name in these methods in Document:
getFieldables, getFields, getValues, getBinaryValues.
(Stefan
Trcek vai Mike McCandless)
- LUCENE-1234
: Make BoostingSpanScorer protected.
(Andi Vajda via Grant Ingersoll)
- LUCENE-510
: The index now stores strings as true UTF-8 bytes
(previously it was Java's modified UTF-8). If any text, either
stored fields or a token, has illegal UTF-16 surrogate characters,
these characters are now silently replaced with the Unicode
replacement character U+FFFD. This is a change to the index file
format.
(Marvin Humphrey via Mike McCandless)
- LUCENE-852
: Let the SpellChecker caller specify IndexWriter mergeFactor
and RAM buffer size.
(Otis Gospodnetic)
- LUCENE-1290
: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator
and remove all references to these classes from the core. Also update demos
and tutorials.
(Michael Busch)
- LUCENE-1288
: Add getVersion() and getGeneration() to IndexCommit.
getVersion() returns the same value that IndexReader.getVersion()
returns when the reader is opened on the same commit.
(Jason
Rutherglen via Mike McCandless)
- LUCENE-1311
: Added IndexReader.listCommits(Directory) static
method to list all commits in a Directory, plus IndexReader.open
methods that accept an IndexCommit and open the index as of that
commit. These methods are only useful if you implement a custom
DeletionPolicy that keeps more than the last commit around.
(Jason Rutherglen via Mike McCandless)
- LUCENE-1325
: Added IndexCommit.isOptimized().
(Shalin Shekhar
Mangar via Mike McCandless)
- LUCENE-1324
: Added TokenFilter.reset().
(Shai Erera via Mike
McCandless)
- LUCENE-1340
: Added Fieldable.omitTf() method to skip indexing term
frequency, positions and payloads. This saves index space, and
indexing/searching time.
(Eks Dev via Mike McCandless)
- LUCENE-1219
: Add basic reuse API to Fieldable for binary fields:
getBinaryValue/Offset/Length(); currently only lazy fields reuse
the provided byte[] result to getBinaryValue.
(Eks Dev via Mike
McCandless)
- LUCENE-1334
: Add new constructor for Term: Term(String fieldName)
which defaults term text to "".
(DM Smith via Mike McCandless)
- LUCENE-1333
: Added Token.reinit(*) APIs to re-initialize (reuse) a
Token. Also added term() method to return a String, with a
performance penalty clearly documented. Also implemented
hashCode() and equals() in Token, and fixed all core and contrib
analyzers to use the re-use APIs.
(DM Smith via Mike McCandless)
- LUCENE-1329
: Add optional readOnly boolean when opening an
IndexReader. A readOnly reader is not allowed to make changes
(deletions, norms) to the index; in exchanged, the isDeleted
method, often a bottleneck when searching with many threads, is
not synchronized. The default for readOnly is still false, but in
3.0 the default will become true.
(Jason Rutherglen via Mike
McCandless)
- LUCENE-1367
: Add IndexCommit.isDeleted().
(Shalin Shekhar Mangar
via Mike McCandless)
- LUCENE-1061
: Factored out all "new XXXQuery(...)" in
QueryParser.java into protected methods newXXXQuery(...) so that
subclasses can create their own subclasses of each Query type.
(John Wang via Mike McCandless)
- LUCENE-753
: Added new Directory implementation
org.apache.lucene.store.NIOFSDirectory, which uses java.nio's
FileChannel to do file reads. On most non-Windows platforms, with
many threads sharing a single searcher, this may yield sizable
improvement to query throughput when compared to FSDirectory,
which only allows a single thread to read from an open file at a
time.
(Jason Rutherglen via Mike McCandless)
- LUCENE-1371
: Added convenience method TopDocs Searcher.search(Query query, int n).
(Mike McCandless)
- LUCENE-1356
: Allow easy extensions of TopDocCollector by turning
constructor and fields from package to protected.
(Shai Erera
via Doron Cohen)
- LUCENE-1375
: Added convencience method IndexCommit.getTimestamp,
which is equivalent to
getDirectory().fileModified(getSegmentsFileName()).
(Mike McCandless)
- LUCENE-1366
: Rename Field.Index options to be more accurate:
TOKENIZED becomes ANALYZED; UN_TOKENIZED becomes NOT_ANALYZED;
NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS
is added.
(Mike McCandless)
- LUCENE-1131
: Added numDeletedDocs method to IndexReader
(Otis Gospodnetic)
- Bug fixes
(16)
- LUCENE-1134
: Fixed BooleanQuery.rewrite to only optimize a single
clause query if minNumShouldMatch<=0.
(Shai Erera via Michael Busch)
- LUCENE-1169
: Fixed bug in IndexSearcher.search(): searching with
a filter might miss some hits because scorer.skipTo() is called
without checking if the scorer is already at the right position.
scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as
scorer.next().
(Eks Dev, Michael Busch)
- LUCENE-1182
: Added scorePayload to SimilarityDelegator
(Andi Vajda via Grant Ingersoll)
- LUCENE-1213
: MultiFieldQueryParser was ignoring slop in case
of a single field phrase.
(Trejkaz via Doron Cohen)
- LUCENE-1228
: IndexWriter.commit() was not updating the index version and as
result IndexReader.reopen() failed to sense index changes.
(Doron Cohen)
- LUCENE-1267
: Added numDocs() and maxDoc() to IndexWriter;
deprecated docCount().
(Mike McCandless)
- LUCENE-1274
: Added new prepareCommit() method to IndexWriter,
which does phase 1 of a 2-phase commit (commit() does phase 2).
This is needed when you want to update an index as part of a
transaction involving external resources (eg a database). Also
deprecated abort(), renaming it to rollback().
(Mike McCandless)
- LUCENE-1003
: Stop RussianAnalyzer from removing numbers.
(TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
- LUCENE-1152
: SpellChecker fix around clearIndex and indexDictionary
methods, plus removal of IndexReader reference.
(Naveen Belkale via Otis Gospodnetic)
- LUCENE-1046
: Removed dead code in SpellChecker
(Daniel Naber via Otis Gospodnetic)
- LUCENE-1189
: Fixed the QueryParser to handle escaped characters within
quoted terms correctly.
(Tomer Gabel via Michael Busch)
- LUCENE-1299
: Fixed NPE in SpellChecker when IndexReader is not null and field is
(Grant Ingersoll)
- LUCENE-1303
: Fixed BoostingTermQuery's explanation to be marked as a Match
depending only upon the non-payload score part, regardless of the effect of
the payload on the score. Prior to this, score of a query containing a BTQ
differed from its explanation.
(Doron Cohen)
- LUCENE-1310
: Fixed SloppyPhraseScorer to work also for terms repeating more
than twice in the query.
(Doron Cohen)
- LUCENE-1351
: ISOLatin1AccentFilter now cleans additional ligatures
(Cedrik Lime via Grant Ingersoll)
- LUCENE-1383
: Workaround a nasty "leak" in Java's builtin
ThreadLocal, to prevent Lucene from causing unexpected
OutOfMemoryError in certain situations (notably J2EE
applications).
(Chris Lu via Mike McCandless)
- New features
(20)
- LUCENE-1137
: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis
process. The flag is not indexed/stored and is thus only used by analysis.
- LUCENE-1147
: Add -segment option to CheckIndex tool so you can
check only a specific segment or segments in your index.
(Mike
McCandless)
- LUCENE-1045
: Reopened this issue to add support for short and bytes.
- LUCENE-584
: Added new data structures to o.a.l.util, such as
OpenBitSet and SortedVIntList. These extend DocIdSet and can
directly be used for Filters with the new Filter API. Also changed
the core Filters to use OpenBitSet instead of java.util.BitSet.
(Paul Elschot, Michael Busch)
- LUCENE-494
: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms.
This Analyzer is not intended for use during indexing.
(Mark Harwood via Grant Ingersoll)
- LUCENE-1044
: Change Lucene to properly "sync" files after
committing, to ensure on a machine or OS crash or power cut, even
with cached writes, the index remains consistent. Also added
explicit commit() method to IndexWriter to force a commit without
having to close.
(Mike McCandless)
- LUCENE-997
: Add search timeout (partial) support.
A TimeLimitedCollector was added to allow limiting search time.
It is a partial solution since timeout is checked only when
collecting a hit, and therefore a search for rare words in a
huge index might not stop within the specified time.
(Sean Timm via Doron Cohen)
- LUCENE-1184
: Allow SnapshotDeletionPolicy to be re-used across
close/re-open of IndexWriter while still protecting an open
snapshot
(Tim Brennan via Mike McCandless)
- LUCENE-1194
: Added IndexWriter.deleteDocuments(Query) to delete
documents matching the specified query. Also added static unlock
and isLocked methods (deprecating the ones in IndexReader).
(Mike
McCandless)
- LUCENE-1201
: Add IndexReader.getIndexCommit() method.
(Tim Brennan
via Mike McCandless)
- LUCENE-550
: Added InstantiatedIndex implementation. Experimental
Index store similar to MemoryIndex but allows for multiple documents
in memory.
(Karl Wettin via Grant Ingersoll)
- LUCENE-400
: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper
that wraps another Analyzer's token stream with a ShingleFilter
(Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
- LUCENE-1166
: Decomposition tokenfilter for languages like German and Swedish
(Thomas Peuss via Grant Ingersoll)
- LUCENE-1187
: ChainedFilter and BooleanFilter now work with new Filter API
and DocIdSetIterator-based filters. Backwards-compatibility with old
BitSet-based filters is ensured.
(Paul Elschot via Michael Busch)
- LUCENE-1295
: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public.
(Grant Ingersoll)
- LUCENE-1298
: MoreLikeThis can now accept a custom Similarity
(Grant Ingersoll)
- LUCENE-1297
: Allow other string distance measures for the SpellChecker
(Thomas Morton via Otis Gospodnetic)
- LUCENE-1001
: Provide access to Payloads via Spans. All existing Span Query implementations in Lucene implement.
(Mark Miller, Grant Ingersoll)
- LUCENE-1354
: Provide programmatic access to CheckIndex
(Grant Ingersoll, Mike McCandless)
- LUCENE-1279
: Add support for Collators to RangeFilter/Query and Query Parser.
(Steve Rowe via Grant Ingersoll)
- Optimizations
(6)
- LUCENE-705
: When building a compound file, use
RandomAccessFile.setLength() to tell the OS/filesystem to
pre-allocate space for the file. This may improve fragmentation
in how the CFS file is stored, and allows us to detect an upcoming
disk full situation before actually filling up the disk.
(Mike
McCandless)
- LUCENE-1120
: Speed up merging of term vectors by bulk-copying the
raw bytes for each contiguous range of non-deleted documents.
(Mike McCandless)
- LUCENE-1185
: Avoid checking if the TermBuffer 'scratch' in
SegmentTermEnum is null for every call of scanTo().
(Christian Kohlschuetter via Michael Busch)
- LUCENE-1217
: Internal to Field.java, use isBinary instead of
runtime type checking for possible speedup of binaryValue().
(Eks Dev via Mike McCandless)
- LUCENE-1183
: Optimized TRStringDistance class (in contrib/spell) that uses
less memory than the previous version.
(Cédrik LIME via Otis Gospodnetic)
- LUCENE-1195
: Improve term lookup performance by adding a LRU cache to the
TermInfosReader. In performance experiments the speedup was about 25% on
average on mid-size indexes with ~500,000 documents for queries with 3
terms and about 7% on larger indexes with ~4.3M documents.
(Michael Busch)
- Documentation
(3)
- LUCENE-1236
: Added some clarifying remarks to EdgeNGram*.java
(Hiroaki Kawai via Grant Ingersoll)
- LUCENE-1157
and LUCENE-1256
: HTML changes log, created automatically
from CHANGES.txt. This HTML file is currently visible only via developers page.
(Steven Rowe via Doron Cohen)
- LUCENE-1349
: Fieldable can now be changed without breaking backward compatibility rules (within reason. See the note at
the top of this file and also on Fieldable.java).
(Grant Ingersoll)
- Build
(3)
- LUCENE-1153
: Added JUnit JAR to new lib directory. Updated build to rely on local JUnit instead of ANT/lib.
- LUCENE-1202
: Small fixes to the way Clover is used to work better
with contribs. Of particular note: a single clover db is used
regardless of whether tests are run globally or in the specific
contrib directories.
- LUCENE-1353
: Javacc target in contrib/miscellaneous for
generating the precedence query parser.
- Test Cases
(2)
- LUCENE-1238
: Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded.
Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped
collector to collect also the last doc, after allowed-tTime passed.
(Doron Cohen)
- LUCENE-1348
: relax TestTimeLimitedCollector to not fail due to
timeout exceeded (just because test machine is very busy).
分享到:
Global site tag (gtag.js) - Google Analytics
相关推荐
**Lucene 2.4.0 知识点详解** Lucene 是一个开源的全文搜索引擎库,由 Apache 软件基金会开发并维护。在版本 2.4.0 中,它提供了强大的文本检索功能,使得开发者能够快速、高效地在大量数据中搜索相关信息。这个...
《Apache Lucene 2.4.0 源码解析》 Apache Lucene 是一个高性能、全文本搜索引擎库,由Java编写,它为开发者提供了在应用程序中实现全文搜索功能的基本工具。Lucene 2.4.0 版本是这个项目历史上的一个重要里程碑,...
《深入剖析Lucene 2.4.0:核心与扩展》 Lucene是一个开源全文搜索引擎库,由Apache软件基金会开发并维护。在2.4.0版本中,Lucene为开发者提供了一套强大的文本检索和分析工具,使得构建高效、可扩展的搜索应用成为...
**Lucene 2.4.0 Jar 包详解** Lucene 是一个开源的全文搜索引擎库,由 Apache 软件基金会开发并维护。这个“lucene-2.4.0 jar 包”是 Lucene 的一个重要版本,发布于较早时期,尽管现在已经有更新的版本,但对理解 ...
《深入理解Lucene 2.4.0:构建高效搜索引擎》 Lucene是一个开源的全文检索库,由Apache软件基金会支持,广泛应用于各种搜索引擎的开发。Lucene 2.4.0是其早期的一个版本,虽然现在有更新的版本,但这个版本依然具有...
《深入理解Lucene 2.4.0:搜索引擎开发的核心技术》 Apache Lucene是一个开源全文检索库,它为开发者提供了构建高级搜索功能的基础工具。本文将深入探讨Lucene 2.4.0版本,这一版本在当时是极具影响力的,为搜索...
《深入剖析Lucene 2.4.0源码》 Lucene是一款强大的全文搜索引擎库,由Apache软件基金会开发,广泛应用于各种搜索应用中。这里我们聚焦于Lucene 2.4.0版本的源码,这是一份珍贵的学习资料,对于理解Lucene的工作原理...
lucene-highlighter-2.4.0.jar lucene highlighter
lucene 2.4.0 javadoc api 是不能搜索的,不过资料都全,我还上了一个可以搜索的, 你在搜下
lucene-2.4.0 自己下的 !不知道对不对 lucene-2.4.0 自己下的 ! 可以用 我已经试过了 解压之后 更改后缀 再解压一次
lucene 2.4 jar lucene2.4版本的JAR包
标题中的“lucene-2.4.0jar包”指的是Lucene的2.4.0版本,这是一个较早的稳定版本,但仍然包含了丰富的功能和示例。 在这个压缩包中,你将找到以下内容: 1. **Lucene核心库**:`lucene-core-2.4.0.jar` 是Lucene...
Apache Lucene.Net 2.4.0 API
**Lucene.NET 2.4.0:在.NET平台上的搜索引擎开发神器** Lucene.NET是Apache Lucene项目的一个分支,专为.NET Framework设计,提供了一套强大的文本搜索库,使得.NET开发者可以方便地构建高性能的全文搜索引擎。这...
《Lucene核心技术详解——以lucene-core-2.4.0.jar为例》 Apache Lucene是一个开源全文搜索引擎库,它为开发者提供了强大的文本搜索功能。本文将以“lucene-core-2.4.0.jar”这一特定版本为例,深入探讨Lucene的...
java lucene 2.4.0 api 发现网上找不到...就自己做了个
在这个项目中,我们将探讨如何利用Lucene 2.4.0版本与Access数据库结合,实现对数据库内容的全文检索。 首先,我们需要理解Lucene的基本工作原理。Lucene的核心概念包括文档(Document)、字段(Field)和索引...
值得注意的是,尽管这里给出的是Lucene 2.4.0版本,Lucene已经发展到了更高级的版本,新版本通常会包含更多的优化和新特性。因此,在实际项目中,建议使用最新稳定版本以获取最佳性能和兼容性。 总之,Lucene是一个...
开发环境包括Java 1.6、Eclipse 3.4.2、Lucene 2.4.0和运行在Windows XP SP3上的Hadoop 0.19.1。 首先,我们需要在本地文件系统上创建索引。在Eclipse中创建一个新的项目,并为项目添加所有必要的JAR文件。假设我们...