cfan_haifeng

浏览: 123635 次
性别:
来自: 郑州

最近访客更多访客>>

ganxueyun

wufei123

psuqgyy1

xinlingting

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

lucene-wiki翻译：如何提高索引速度-1

博客分类：

翻译
Lucene

lucene 索引速度提高

原文：http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
导航：Lucene-java Wiki-》1 Overview-》1.1 Informational-》 1.1.1BasicsOfPerformance-》1.1.1.4 ImproveIndexingSpeed
注意：“ 红色 ”，表示不知道、不确定怎么翻译。 “ 蓝色”自己的描述。
状态：完成

如何加快索引速度呢

原文写道

Here are some things to try to speed up the indexing speed of your Lucene application. Please see ImproveSearchingSpeedfor how to speed up searching.

有时，我们需要加快索引的速度，可以从以下入手（建议参考ImproveSearchingSpeed）

原文写道

Be sure you really need to speed things up. Many of the ideas here are simple to try, but others will necessarily add some complexity to your application. So be sure your indexing speed is indeed too slow and the slowness is indeed within Lucene.

首先确定是否真的有必要提高索引速度。这是因为，提高索引速度的方法可能会很复杂（当然有些也很简单，但未必合用）。所以，你的第一要务是确定是否你的索引真的很慢，并且造成索引慢的原因确实是lucene，而不是数据库操作、IO操作等……

如果你真的确定需要提高索引速度，可以看看下面的内容，它可以帮助你。

1.确保使用最新的Lucene版本.

2. 用本地文件系统。

原文写道

Use a local filesystem. Remote filesystems are typically quite a bit slower for indexing. If your index needs to be on the remote fileysystem, consider building it first on the local filesystem and then copying it up to the remote filesystem.

用本地文件系统。在远程文件系统上建立所以是十分慢的。如果你需要在远程文件系统上建立索引，可以考虑首先在本地建立索引，然后在复制到远程文件系统中去。

不好意思，2重了

2.提升硬件，尤其是IO系统的硬件

原文写道

Get faster hardware, especially a faster IO system. If possible, use a solid-state disk (SSD). These devices have come down substantially in price recently, and much lower cost of seeking can be a very sizable speedup in cases where the index cannot fit entirely in the OS's IO cache.

提升硬件，尤其是IO系统的硬件. 如果可能的话，尽量使用固态硬盘 (SSD)。这些设备越来越便宜了……

3.仅打开一个写入器（IndexWriter），并在索引会话期间重用她。

原文写道

Open a single writer and re-use it for the duration of your indexing session.

仅打开一个写入器（IndexWriter），并在索引会话期间重用她。

4.根据内存使用情况而不是已经索引文档个数来刷新（写入到磁盘）

原文写道

Flush by RAM usage instead of document count.
For Lucene <= 2.2: call writer.ramSizeInBytes() after every added doc then call flush() when it's using too much RAM. This is especially good if you have small docs or highly variable doc sizes. You need to first set maxBufferedDocs large enough to prevent the writer from flushing based on document count. However, don't set it too large otherwise you may hit LUCENE-845. Somewhere around 2-3X your "typical" flush count should be OK.
For Lucene >= 2.3: IndexWriter can flush according to RAM usage itself. Call writer.setRAMBufferSizeMB() to set the buffer size. Be sure you don't also have any leftover calls to setMaxBufferedDocs since the writer will flush "either or" (whichever comes first).

大家知道，IO操作是非常慢的，而RAM则快了非常之多。所以Lucene在建立索引的时候通常先把索引写入到内存当中，等达到一定数量后在写入到磁盘中，以提升索引速度。所以这里的问题变成了：何时或者什么情况下写入磁盘呢？

有两种方式，第一种：计算文档个数，例如当在RAM中索引了100个文档后，我就将RAM中的索引写入到磁盘中。

第二种，根据 RAM 情况，如果感觉RAM还挺多，就继续写入到RAM中；否则就写到磁盘中。

总结：显然第二种智能多了；第一种，不好说了，或许有些机器10000个都不多，有些1000个就不行了，呵呵。

我的蹩脚的翻译，呵呵写道

对于Lucene<=2.2的版本：当Lucene占用太多内存（RAM）的时候，我们可以再每个“added doc” 之后调用 writer.ramSizeInBytes() 方法，接着调用flush()方法。如果你要索引的文档（docs ）很小或者highly variable doc sizes，这种方式将非常好。为了防止写入器（writer ）达到某个文档个数时就写入磁盘，你需要首先把maxBufferedDocs设置的足够大。但也不要设置的过大，可以参考LUCENE-845 。通常，按照 2-3X这个公式去刷新应该就可以。

在看看高人的翻译，差距啊。99%普通 100%专家啊

在Lucene 2.2之前的版本，可以在每次添加文档后调用 writer.ramSizeInBytes() 方法，当索引占用过多的内存时，然后在调用flush()方法。这样做在索引大量小文档或者文档大小不定的情况下尤为有效。你必须先把maxBufferedDocs参数设置足够大，以防止writer基于文档数量flush。但是注意，别把这个值设置的太大，否则你将遭遇Lucene-845号BUG。不过这个BUG已经在2.3版本中得到解决。

PS：关于maxBufferedDocs 之流，可以网上说的很多，可以看看这个：lucene3.0_和IndexWriter有关的几个参数设置及重建索引注意事项。

对于Lucene >= 2.3 的版本，IndexWriter 他自己就可以根据RAM 使用情况来刷新（写入磁盘）。可以通过 writer.setRAMBufferSizeMB()来设置缓存大小。当你打算按照内存大小flush后，确保没有在别的地方设置MaxBufferedDocs值。否则flush条件将变的不确定（谁先符合条件就按照谁）。

下接：lucene-wiki翻译：如何提高索引速度-2

分享到：

lucene-wiki翻译：如何提高索引速度-2 | htmlparser解析html分页

2011-12-26 17:05
浏览 1921
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene-wiki翻译：如何提高索引速度-1

如何加快索引速度呢

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene-wiki翻译：如何提高索引速度-1

如何加快索引速度呢

评论

发表评论

相关推荐

lucene-segments的文件格式分析

Lucene3.4.0-javadocs-通过TokenStream遍历分词结果

Lucene-javadocs-Lucene 3.4.0 core API

lucene-相关概念与定义

lucene-性能实验

lucene-wiki翻译：如何使搜索更快

lucene-wiki翻译：如何提高索引速度-4

lucene-wiki翻译：如何提高索引速度-4

lucene-wiki翻译：如何提高索引速度-3

lucene-wiki翻译：如何提高索引速度-2

lucene-wiki翻译-(lucene 常见问题提醒)LuceneCaveats

Lucene-入门

最近访客更多访客>>