[转]lucene3.0_IndexSearcher排序

badwing

浏览: 126749 次
性别:
来自: 杭州

最近访客更多访客>>

zhangxiaopei

guoguoniunai

dante

280377485

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Lucene

IndexSearcher排序

本文主要讲解：

1.IndexSearcher中和排序相关的方法及sort类、SortField类（api级别）；

2.按文档得分进行排序；

3.按文档内部id进行排序；

4.数值型、日期型排序注意事项；

5.多Field排序；

6.通过改变boost值来改变文档的得分。

----------------------------------------------------------------------

1.IndexSearcher中和排序相关的方法及sort类、SortField类（api级别）；

用IndexSearcher直接排序一般使用方法

abstract TopFieldDocs search(Weight weight, Filter filter, int n, Sort sort)
Expert: Low-level search implementation with arbitrary sorting.

该方法只需传入一个sort实例。

Constructor Summary

Sort()
Sorts by computed relevance.

Sort(SortField... fields)
Sorts in succession by the criteria in each SortField.

Sort(SortField field)
Sorts by the criteria in the given SortField.

在sort实例中，决定对哪个字段进行排序，按照什么数据类型排序，是升序还是降序，由SortField说的算。

两个最基础的构造方法如下：

SortField(String field, int type)
Creates a sort by terms in the given field with the type of term values explicitly given.

SortField(String field, int type, boolean reverse)
Creates a sort, possibly in reverse, by terms in the given field with the type of term values explicitly given.

通过这些类我们能很方便的完成检索结果的排序。

简单示例：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->SortField sortF = new SortField("f", SortField.INT);
Sort sort = new Sort( sortF);
TopFieldDocs docs = searcher.search(query, null, 10, sort);
//遍历docs中的结果

2.按文档得分进行排序；

IndexSearcher默认的搜索就是按照文档得分进行排序的。

在SortField中将类型设置为SCORE即可。

static int SCORE
Sort by document score (relevancy).

3.按文档内部id进行排序；

每个文档进入索引的时候都会分配一个id号，有时可能会需要按照这个id号进行排序，

那么将SortField中类型设置为DOC即可。

static int DOC
Sort by document number (index order).

4.数值型、日期型排序注意事项；

假设莫一索引有五个文档，默认排序如下所示：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->Document<stored,indexed<f:10> stored,indexed<f1:20100215> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20090512> stored,indexed<a:fox>>
Document<stored,indexed<f:5> stored,indexed<f1:20101019> stored,indexed<a:fox>>
Document<stored,indexed<f:-2> stored,indexed<f1:20000128> stored,indexed<a:fox>>
Document<stored,indexed<f:0> stored,indexed<f1:20050719> stored,indexed<a:fox>>

注意蓝色标识出来的字段是一个int型数据，红色标识出来的字段是一个8位的日期数据。默认排序中他是无序的。

使用INT类型对 f 字段进行排序：

结果：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->Document<stored,indexed<f:-2> stored,indexed<f1:20000128> stored,indexed<a:fox>>
Document<stored,indexed<f:0> stored,indexed<f1:20050719> stored,indexed<a:fox>>
Document<stored,indexed<f:5> stored,indexed<f1:20101019> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20100215> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20090512> stored,indexed<a:fox>>

符合预期结果。

使用STRING类型对 f 字段进行排序：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->Document<stored,indexed<f:-2> stored,indexed<f1:20000128> stored,indexed<a:fox>>
Document<stored,indexed<f:0> stored,indexed<f1:20050719> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20100215> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20090512> stored,indexed<a:fox>>
Document<stored,indexed<f:5> stored,indexed<f1:20101019> stored,indexed<a:fox>>

第五条数据排序发生异常，不符合预期结果。

因此排序时要特别注意类型的选择。

使用INT类型对 f1 字段进行排序：

结果：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->Document<stored,indexed<f:-2> stored,indexed<f1:20000128> stored,indexed<a:fox>>
Document<stored,indexed<f:0> stored,indexed<f1:20050719> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20090512> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20100215> stored,indexed<a:fox>>
Document<stored,indexed<f:5> stored,indexed<f1:20101019> stored,indexed<a:fox>>

符合预期结果。

注意点：

对日期、价格等数据排序都要选择合适的排序类型，不单单是满足业务的需要，而且用INT、FLOAT等数值型的排序

比STRING效率要高。

5.多Field排序；

...实例代码：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->SortField sortF = new SortField("f", SortField.INT);
            SortField sortF2 = new SortField("f1", SortField.INT);
            Sort sort = new Sort(new SortField[]{sortF , sortF2});
            TopFieldDocs docs = searcher.search(query, null, 10, sort);

结果：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->Document<stored,indexed<f:-2> stored,indexed<f1:20000128> stored,indexed<a:fox>>
Document<stored,indexed<f:0> stored,indexed<f1:20050719> stored,indexed<a:fox>>
Document<stored,indexed<f:5> stored,indexed<f1:20101019> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20090512> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20100215> stored,indexed<a:fox>>

注意点：

先按照 f字段进行排序，如果 f字段值相等，再按照 f1字段进行排序。

这个顺序由 SortField数组中 SortField实例的顺序一致。

6.通过改变boost值来改变文档的得分。

默认排序（相关度排序），原始排序情况：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->Document<stored,indexed<f:10> stored,indexed<f1:20100215> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20090512> stored,indexed<a:fox>>
Document<stored,indexed<f:5> stored,indexed<f1:20101019> stored,indexed<a:fox>>
Document<stored,indexed<f:-2> stored,indexed<f1:20000128> stored,indexed<a:fox>>
Document<stored,indexed<f:0> stored,indexed<f1:20050719> stored,indexed<a:fox>>

修改第5个文档的boost值。

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->doc5.setBoost(5);

然后再看看排序情况：

<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->Document<stored,indexed<f:0> stored,indexed<f1:20050719> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20100215> stored,indexed<a:fox>>
Document<stored,indexed<f:10> stored,indexed<f1:20090512> stored,indexed<a:fox>>
Document<stored,indexed<f:5> stored,indexed<f1:20101019> stored,indexed<a:fox>>
Document<stored,indexed<f:-2> stored,indexed<f1:20000128> stored,indexed<a:fox>>

可以看到从地到天了！

这个功能的商用价值很大，只能这么说...

分享到：

[转]Oracle中日期时间的操作比较和加减- ... | [转]Tomcat自定义监听器

2011-04-08 16:03
浏览 977
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论