lucene-wiki翻译：如何提高索引速度-2

cfan_haifeng

浏览: 123631 次
性别:
来自: 郑州

最近访客更多访客>>

ganxueyun

wufei123

psuqgyy1

xinlingting

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

翻译
Lucene

lucene 索引速度提高 java

原文：http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
导航：Lucene-java Wiki-》1 Overview-》1.1 Informational-》 1.1.1BasicsOfPerformance-》1.1.1.4 ImproveIndexingSpeed
注意：“ 红色 ”，表示不知道、不确定怎么翻译。 “ 蓝色”自己的描述。
状态：完成
上接：lucene-wiki翻译：如何提高索引速度-1

5. 尽可能的使用RAM

原文写道

Use as much RAM as you can afford.
More RAM before flushing means Lucene writes larger segments to begin with which means less merging later. Testing in LUCENE-843 found that around 48 MB is the sweet spot for that content set, but, your application could have a different sweet spot.

在flush之前使用的RAM越多意味着segments越大， segments越大意味着以后需要合并的次数就越少。经 LUCENE-843 测试，发现对于内容集合来说，缓存设置为48MB时性能最好。不过，你的应用应该不是这个，呵呵.

下面，看看高人的翻译

在你能承受的范围内使用更多的内存
在flush前使用更多的内存意味着Lucene将在索引时生成更大的segment，也意味着合并次数也随之减少。在Lucene-843中测试，大概48MB内存可能是一个比较合适的值。但是，你的程序可能会是另外一个值。这跟不同的机器也有一定的关系，请自己多加测试，选择一个权衡值。

6.关闭复合索引

Turn off compound file format.

Call setUseCompoundFile(false). Building the compound file format takes time during indexing (7-33% in testing for LUCENE-888). However, note that doing this will greatly increase the number of file descriptors used by indexing and by searching, so you could run out of file descriptors if mergeFactor is also large.

调用setUseCompoundFile(false)方法可以关闭复合索引。从 LUCENE-888中的实验中可以看出，建立复合索引的时间大概是正常索引的7-33%。然后，这样做的后果是将大大增加了索引和搜索的文件数量，……

在看看高人的翻译，这回简直是南辕北辙啊

写道

调用setUseCompoundFile(false)可以关闭复合文件选项。生成复合文件将消耗更多的时间（经过Lucene-888测试，大概会增加7%-33%的时间）。但是请注意，这样做将大大的增加搜索和索引使用的文件句柄的数量。如果合并因子也很大的话，你可能会出现用光文件句柄的情况。

MegerFactor（合并因子）是控制segment合并频率的，其决定了一个索引块中包括多少个文档，当硬盘上的索引块达到多少时，将它们合并成一个较大的索引块。当MergeFactor值较大时，生成索引的速度较快。MergeFactor的默认值是10，建议在建立索引前将其设置的大一些。

7.复用Document and Field实例

原文写道

Re-use Document and Field instances As of Lucene 2.3 there are new setValue(...) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(...), etc), and then re-add your Document instance.

Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index. See Field for details.

尽量重用Document 和 Field实例。在lucene2.3中新增了setValue(...)方法，这个方法可以改变Field的value值。通过该方法将使得added 多个documents时只有一个Field实例就可以了，从而降低垃圾回收的代价。另外，最好也只建立一个Document实例，然后向Document实例添加多个Field实例，不过这些Field对象……

例如，你可能有一个idField，bodyField、ameField, storedField1等等。在这些文档被added之后，你可以直接改变Field的value（例如，调用idField.setValue(...),……），然后重新加入到你的文档实例中。

请注意，你不能重用一个Field实例在一个Document实例中，并且你不应该改变Field的value直到包含该Field的value被添加到索引当中。更多，可以参看Field详情。

看看，高人的翻译

在lucene 2.3中，新增了一个叫setValue的方法，可以允许你改变字段的值。这样的好处是你可以在整个索引进程中复用一个Filed实例。这将极大的减少GC负担。
最好创建一个单一的Document实例，然后添加你想要的字段到文档中。同时复用添加到文档的Field实例，通用调用相应的SetValue方法改变相应的字段的值。然后重新将Document添加到索引中。
注意：你不能在一个文档中多个字段共用一个Field实例，在文档添加到索引之前，Field的值都不应该改变。也就是说如果你有3个字段，你必须创建3个Field实例，然后再之后的Document添加过程中复用它们。

附上一段代码(不能运行)以供理解，不过这段代码复用了Field，但并未复用Document。

            writerFS = new IndexWriter(dirFS, new StandardAnalyzer(Version.LUCENE_30), true, MaxFieldLength.UNLIMITED);
            //
            Field f1 = new Field("f1", "", Store.YES, Index.ANALYZED);
            Field f2 = new Field("f2", "", Store.YES, Index.ANALYZED);
            for (int i = 0; i < 1000000; i++) {
                Document doc = new Document();
                f1.setValue("f1 hello doc" + i);
                doc.add(f1);
                f2.setValue("f2 world doc" + i);
                doc.add(f2);
                writer.addDocument(doc);
            }
//            writer.commit();
            writerFS.addIndexes(writer.getReader());

分享到：

lucene-wiki翻译：如何提高索引速度-3 | lucene-wiki翻译：如何提高索引速度-1

2011-12-30 15:42
浏览 1680
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论