- 浏览: 40987 次
- 来自: ...
最新评论
使用Lucene来搜索内容,搜索结果的显示顺序当然是比较重要的.Lucene中Build-in的几个排序定义在大多数情况下是不适合我们使用的.要适合自己的应用程序的场景,就只能自定义排序功能,本节我们就来看看在Lucene中如何实现自定义排序功能.
Lucene中的自定义排序功能和Java集合中的自定义排序的实现方法差不多,都要实现一下比较接口. 在Java中只要实现Comparable接口就可以了.但是在Lucene中要实现SortComparatorSource接口和ScoreDocComparator接口.在了解具体实现方法之前先来看看这两个接口的定义吧.
SortComparatorSource接口的功能是返回一个用来排序ScoreDocs的comparator(Expert: returns a comparator for sorting ScoreDocs).该接口只定义了一个方法.如下:
该方法只是创造一个ScoreDocComparator 实例用来实现排序.所以我们还要实现ScoreDocComparator 接口.来看看ScoreDocComparator 接口.功能是比较来两个ScoreDoc 对象来排序(Compares two ScoreDoc objects for sorting) 里面定义了两个Lucene实现的静态实例.如下:
有3个方法与排序相关,需要我们实现 分别如下:
看个例子吧!
该例子为Lucene in Action中的一个实现,用来搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储.
这是一个实现了上面两个接口的两个类, 里面带有详细注释, 可以看出 自定义排序并不是很难的. 该实现能否正确实现,我们来看看测试代码能否通过吧.
Lucene中的自定义排序功能和Java集合中的自定义排序的实现方法差不多,都要实现一下比较接口. 在Java中只要实现Comparable接口就可以了.但是在Lucene中要实现SortComparatorSource接口和ScoreDocComparator接口.在了解具体实现方法之前先来看看这两个接口的定义吧.
SortComparatorSource接口的功能是返回一个用来排序ScoreDocs的comparator(Expert: returns a comparator for sorting ScoreDocs).该接口只定义了一个方法.如下:
/** * Creates a comparator for the field in the given index. * @param reader - Index to create comparator for. * @param fieldname - Field to create comparator for. * @return Comparator of ScoreDoc objects. * @throws IOException - If an error occurs reading the index. */ public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException /** * Creates a comparator for the field in the given index. * @param reader - Index to create comparator for. * @param fieldname - Field to create comparator for. * @return Comparator of ScoreDoc objects. * @throws IOException - If an error occurs reading the index. */ public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
该方法只是创造一个ScoreDocComparator 实例用来实现排序.所以我们还要实现ScoreDocComparator 接口.来看看ScoreDocComparator 接口.功能是比较来两个ScoreDoc 对象来排序(Compares two ScoreDoc objects for sorting) 里面定义了两个Lucene实现的静态实例.如下:
//Special comparator for sorting hits according to computed relevance (document score). public static final ScoreDocComparator RELEVANCE; //Special comparator for sorting hits according to index order (document number). public static final ScoreDocComparator INDEXORDER; //Special comparator for sorting hits according to computed relevance (document score). public static final ScoreDocComparator RELEVANCE; //Special comparator for sorting hits according to index order (document number). public static final ScoreDocComparator INDEXORDER;
有3个方法与排序相关,需要我们实现 分别如下:
/** * Compares two ScoreDoc objects and returns a result indicating their sort order. * @param i First ScoreDoc * @param j Second ScoreDoc * @return -1 if i should come before j; * 1 if i should come after j; * 0 if they are equal */ public int compare(ScoreDoc i,ScoreDoc j); /** * Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers. * @param i Document * @return Serializable object */ public Comparable sortValue(ScoreDoc i); /** * Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers. * @return One of the constants in SortField. */ public int sortType(); /** * Compares two ScoreDoc objects and returns a result indicating their sort order. * @param i First ScoreDoc * @param j Second ScoreDoc * @return -1 if i should come before j; * 1 if i should come after j; * 0 if they are equal */ public int compare(ScoreDoc i,ScoreDoc j); /** * Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers. * @param i Document * @return Serializable object */ public Comparable sortValue(ScoreDoc i); /** * Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers. * @return One of the constants in SortField. */ public int sortType();
看个例子吧!
该例子为Lucene in Action中的一个实现,用来搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储.
package com.nikee.lucene; import java.io.IOException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.index.TermDocs; import org.apache.lucene.index.TermEnum; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.ScoreDocComparator; import org.apache.lucene.search.SortComparatorSource; import org.apache.lucene.search.SortField; //实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储 //DistanceComparatorSource 实现了SortComparatorSource接口 public class DistanceComparatorSource implements SortComparatorSource { private static final long serialVersionUID = 1L; // x y 用来保存 坐标位置 private int x; private int y; public DistanceComparatorSource(int x, int y) { this.x = x; this.y = y; } // 返回ScoreDocComparator 用来实现排序功能 public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException { return new DistanceScoreDocLookupComparator(reader, fieldname, x, y); } //DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序 private static class DistanceScoreDocLookupComparator implements ScoreDocComparator { private float[] distances; // 保存每个餐馆到指定点的距离 // 构造函数 , 构造函数在这里几乎完成所有的准备工作. public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException { System.out.println("fieldName2="+fieldname); final TermEnum enumerator = reader.terms(new Term(fieldname, "")); System.out.println("maxDoc="+reader.maxDoc()); distances = new float[reader.maxDoc()]; // 初始化distances if (distances.length > 0) { TermDocs termDocs = reader.termDocs(); try { if (enumerator.term() == null) { throw new RuntimeException("no terms in field " + fieldname); } int i = 0,j = 0; do { System.out.println("in do-while :" + i ++); Term term = enumerator.term(); // 取出每一个Term if (term.field() != fieldname) // 与给定的域不符合则比较下一个 break; //Sets this to the data for the current term in a TermEnum. //This may be optimized in some implementations. termDocs.seek(enumerator); //参考TermDocs Doc while (termDocs.next()) { System.out.println(" in while :" + j ++); System.out.println(" in while ,Term :" + term.toString()); String[] xy = term.text().split(","); // 去处x y int deltax = Integer.parseInt(xy[0]) - x; int deltay = Integer.parseInt(xy[1]) - y; // 计算距离 distances[termDocs.doc()] = (float) Math.sqrt(deltax * deltax + deltay * deltay); } } while (enumerator.next()); } finally { termDocs.close(); } } } //有上面的构造函数的准备 这里就比较简单了 public int compare(ScoreDoc i, ScoreDoc j) { if (distances[i.doc] < distances[j.doc]) return -1; if (distances[i.doc] > distances[j.doc]) return 1; return 0; } // 返回距离 public Comparable sortValue(ScoreDoc i) { return new Float(distances[i.doc]); } //指定SortType public int sortType() { return SortField.FLOAT; } } public String toString() { return "Distance from (" + x + "," + y + ")"; } }
package com.nikee.lucene; import java.io.IOException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.index.TermDocs; import org.apache.lucene.index.TermEnum; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.ScoreDocComparator; import org.apache.lucene.search.SortComparatorSource; import org.apache.lucene.search.SortField; //实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储 //DistanceComparatorSource 实现了SortComparatorSource接口 public class DistanceComparatorSource implements SortComparatorSource { private static final long serialVersionUID = 1L; // x y 用来保存 坐标位置 private int x; private int y; public DistanceComparatorSource(int x, int y) { this.x = x; this.y = y; } // 返回ScoreDocComparator 用来实现排序功能 public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException { return new DistanceScoreDocLookupComparator(reader, fieldname, x, y); } //DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序 private static class DistanceScoreDocLookupComparator implements ScoreDocComparator { private float[] distances; // 保存每个餐馆到指定点的距离 // 构造函数 , 构造函数在这里几乎完成所有的准备工作. public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException { System.out.println("fieldName2="+fieldname); final TermEnum enumerator = reader.terms(new Term(fieldname, "")); System.out.println("maxDoc="+reader.maxDoc()); distances = new float[reader.maxDoc()]; // 初始化distances if (distances.length > 0) { TermDocs termDocs = reader.termDocs(); try { if (enumerator.term() == null) { throw new RuntimeException("no terms in field " + fieldname); } int i = 0,j = 0; do { System.out.println("in do-while :" + i ++); Term term = enumerator.term(); // 取出每一个Term if (term.field() != fieldname) // 与给定的域不符合则比较下一个 break; //Sets this to the data for the current term in a TermEnum. //This may be optimized in some implementations. termDocs.seek(enumerator); //参考TermDocs Doc while (termDocs.next()) { System.out.println(" in while :" + j ++); System.out.println(" in while ,Term :" + term.toString()); String[] xy = term.text().split(","); // 去处x y int deltax = Integer.parseInt(xy[0]) - x; int deltay = Integer.parseInt(xy[1]) - y; // 计算距离 distances[termDocs.doc()] = (float) Math.sqrt(deltax * deltax + deltay * deltay); } } while (enumerator.next()); } finally { termDocs.close(); } } } //有上面的构造函数的准备 这里就比较简单了 public int compare(ScoreDoc i, ScoreDoc j) { if (distances[i.doc] < distances[j.doc]) return -1; if (distances[i.doc] > distances[j.doc]) return 1; return 0; } // 返回距离 public Comparable sortValue(ScoreDoc i) { return new Float(distances[i.doc]); } //指定SortType public int sortType() { return SortField.FLOAT; } } public String toString() { return "Distance from (" + x + "," + y + ")"; } }
这是一个实现了上面两个接口的两个类, 里面带有详细注释, 可以看出 自定义排序并不是很难的. 该实现能否正确实现,我们来看看测试代码能否通过吧.
package com.nikee.lucene.test; import java.io.IOException; import junit.framework.TestCase; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.FieldDoc; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopFieldDocs; import org.apache.lucene.store.RAMDirectory; import com.nikee.lucene.DistanceComparatorSource; public class DistanceComparatorSourceTest extends TestCase { private RAMDirectory directory; private IndexSearcher searcher; private Query query; //建立测试环境 protected void setUp() throws Exception { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true); addPoint(writer, "El Charro", "restaurant", 1, 2); addPoint(writer, "Cafe Poca Cosa", "restaurant", 5, 9); addPoint(writer, "Los Betos", "restaurant", 9, 6); addPoint(writer, "Nico's Taco Shop", "restaurant", 3, 8); writer.close(); searcher = new IndexSearcher(directory); query = new TermQuery(new Term("type", "restaurant")); } private void addPoint(IndexWriter writer, String name, String type, int x, int y) throws IOException { Document doc = new Document(); doc.add(new Field("name", name, Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("type", type, Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("location", x + "," + y, Field.Store.YES, Field.Index.UN_TOKENIZED)); writer.addDocument(doc); } public void testNearestRestaurantToHome() throws Exception { //使用DistanceComparatorSource来构造一个SortField Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(0, 0))); Hits hits = searcher.search(query, sort); // 搜索 //测试 assertEquals("closest", "El Charro", hits.doc(0).get("name")); assertEquals("furthest", "Los Betos", hits.doc(3).get("name")); } public void testNeareastRestaurantToWork() throws Exception { Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(10, 10))); // 工作的坐标 10,10 //上面的测试实现了自定义排序,但是并不能访问自定义排序的更详细信息,利用 //TopFieldDocs 可以进一步访问相关信息 TopFieldDocs docs = searcher.search(query, null, 3, sort); assertEquals(4, docs.totalHits); assertEquals(3, docs.scoreDocs.length); //取得FieldDoc 利用FieldDoc可以取得关于排序的更详细信息 请查看FieldDoc Doc FieldDoc fieldDoc = (FieldDoc) docs.scoreDocs[0]; assertEquals("(10,10) -> (9,6) = sqrt(17)", new Float(Math.sqrt(17)), fieldDoc.fields[0]); Document document = searcher.doc(fieldDoc.doc); assertEquals("Los Betos", document.get("name")); dumpDocs(sort, docs); // 显示相关信息 } // 显示有关排序的信息 private void dumpDocs(Sort sort, TopFieldDocs docs) throws IOException { System.out.println("Sorted by: " + sort); ScoreDoc[] scoreDocs = docs.scoreDocs; for (int i = 0; i < scoreDocs.length; i++) { FieldDoc fieldDoc = (FieldDoc) scoreDocs[i]; Float distance = (Float) fieldDoc.fields[0]; Document doc = searcher.doc(fieldDoc.doc); System.out.println(" " + doc.get("name") + " @ (" + doc.get("location") + ") -> " + distance); } } }
package com.nikee.lucene.test; import java.io.IOException; import junit.framework.TestCase; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.FieldDoc; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopFieldDocs; import org.apache.lucene.store.RAMDirectory; import com.nikee.lucene.DistanceComparatorSource; public class DistanceComparatorSourceTest extends TestCase { private RAMDirectory directory; private IndexSearcher searcher; private Query query; //建立测试环境 protected void setUp() throws Exception { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true); addPoint(writer, "El Charro", "restaurant", 1, 2); addPoint(writer, "Cafe Poca Cosa", "restaurant", 5, 9); addPoint(writer, "Los Betos", "restaurant", 9, 6); addPoint(writer, "Nico's Taco Shop", "restaurant", 3, 8); writer.close(); searcher = new IndexSearcher(directory); query = new TermQuery(new Term("type", "restaurant")); } private void addPoint(IndexWriter writer, String name, String type, int x, int y) throws IOException { Document doc = new Document(); doc.add(new Field("name", name, Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("type", type, Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("location", x + "," + y, Field.Store.YES, Field.Index.UN_TOKENIZED)); writer.addDocument(doc); } public void testNearestRestaurantToHome() throws Exception { //使用DistanceComparatorSource来构造一个SortField Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(0, 0))); Hits hits = searcher.search(query, sort); // 搜索 //测试 assertEquals("closest", "El Charro", hits.doc(0).get("name")); assertEquals("furthest", "Los Betos", hits.doc(3).get("name")); } public void testNeareastRestaurantToWork() throws Exception { Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(10, 10))); // 工作的坐标 10,10 //上面的测试实现了自定义排序,但是并不能访问自定义排序的更详细信息,利用 //TopFieldDocs 可以进一步访问相关信息 TopFieldDocs docs = searcher.search(query, null, 3, sort); assertEquals(4, docs.totalHits); assertEquals(3, docs.scoreDocs.length); //取得FieldDoc 利用FieldDoc可以取得关于排序的更详细信息 请查看FieldDoc Doc FieldDoc fieldDoc = (FieldDoc) docs.scoreDocs[0]; assertEquals("(10,10) -> (9,6) = sqrt(17)", new Float(Math.sqrt(17)), fieldDoc.fields[0]); Document document = searcher.doc(fieldDoc.doc); assertEquals("Los Betos", document.get("name")); dumpDocs(sort, docs); // 显示相关信息 } // 显示有关排序的信息 private void dumpDocs(Sort sort, TopFieldDocs docs) throws IOException { System.out.println("Sorted by: " + sort); ScoreDoc[] scoreDocs = docs.scoreDocs; for (int i = 0; i < scoreDocs.length; i++) { FieldDoc fieldDoc = (FieldDoc) scoreDocs[i]; Float distance = (Float) fieldDoc.fields[0]; Document doc = searcher.doc(fieldDoc.doc); System.out.println(" " + doc.get("name") + " @ (" + doc.get("location") + ") -> " + distance); } } }
发表评论
-
lucene中的同步机制(lucene locking mechanism)及规则(Concurrency rules)
2009-09-30 10:50 17701、多个只读操作都可以 ... -
lucene 排序
2009-09-27 15:46 2994Lucene的默认排序是按照Document的得分进行排序的。 ... -
BooleanQuery组合查询
2009-09-10 11:10 6532应用BooleanQuery进行组合查询时,条件之间的关系是由 ... -
lucene日期索引、查询
2009-09-09 12:45 2777注意使用lucene的版本,调试本例的时候,作者使用的是luc ... -
在lucene中应用poading进行分词
2009-09-07 11:31 16691、下载poading解牛 http://code.googl ... -
用lucene实现摘要的高亮点
2009-02-06 18:10 1435注明:该类主要是符合本 ...
相关推荐
因此,了解如何在 Lucene 中实现自定义排序是非常关键的。在这个话题中,我们将深入探讨如何根据特定的业务需求对搜索结果进行定制排序。 首先,我们要明白 Lucene 默认的排序机制。默认情况下,Lucene 搜索结果是...
Lucene中的自定义排序功能和Java集合中的自定义排序的实现方法差不多,都要实现一下比较接口. 在Java中只要实现Comparable接口就可以了.但是在Lucene中要实现SortComparatorSource接口和ScoreDocComparator接口.在...
本文将深入探讨“Lucene5学习之自定义排序”这一主题,帮助你理解如何在Lucene5中实现自定义的排序规则。 首先,Lucene的核心功能之一就是提供高效的全文检索能力,但默认的搜索结果排序通常是基于相关度得分...
本文将深入探讨如何在 Lucene 中实现自定义评分,以及它对提高搜索质量的重要性。 在 Lucene 中,每个匹配文档都会有一个评分,这个评分通常基于 TF-IDF(词频-逆文档频率)算法,它是衡量一个词在文档中重要性的...
Lucene 排序算法的实现主要在 DefaultSimilarity 类中,该类提供了 tf、idf、coord、queryNorm、norm 等方法来计算文档的分数。用户可以根据需要重写该类以实现自定义的排序算法。 在 Lucene 中,搜索结果的排序...
5. **Sorting**:在Lucene中,我们可以自定义排序规则,包括基于地理位置的距离排序。这可以通过实现`SortComparatorSource`接口来自定义比较器,或者使用`FieldComparatorSource`来创建一个基于特定字段(如地理...
本文将深入探讨Lucene如何根据关键词出现次数进行排序,以及如何实现自定义排序,包括处理`List<Map>`字段的情况,并结合项目中的`pom.xml`配置来解析这一过程。 首先,Lucene默认的排序方式是基于文档的相关性,即...
3. **自定义排序**:除了相关性,我们还可以根据文档的其他字段(如日期、价格等)进行排序。这需要定义`FieldComparatorSource`和`FieldComparator`,以实现自定义比较逻辑。 4. **SortField类型**:`SortField`...
通过阅读这篇博客,读者应该能够了解到如何根据具体需求编写自定义的Collector,并将其应用到Lucene搜索中,从而提升搜索性能或实现特定的业务需求。同时,博主可能还会分享一些最佳实践和常见陷阱,帮助开发者避免...
- 在SSH框架中,可以使用拦截器或者在Action中实现分页逻辑,结合Struts2的result标签展示分页链接。 4. 排序: - Lucene支持多种排序策略,如按照评分评分(Score排序),按照文档ID排序,或者根据自定义字段排序...
这里提供了一个简单的例子,包括一个`DataFactory`类,用于模拟存储药品信息,以及一个`LuceneIKUtil`类,用于实现Lucene索引和搜索功能。`DataFactory`包含了一个`List<Medicine>`,每个`Medicine`对象包含了药品ID...
5. **分页和排序**:使用TopDocs和Sort对象,可以实现搜索结果的分页和自定义排序。 6. **实时更新**:通过NRT(Near Real Time)模式,Lucene能够在短时间内对索引进行更新并反映到搜索结果中。 **四、Lucene与...
总结起来,Lucene5中的Filter过滤器是实现精确、高效搜索的关键工具。通过理解并熟练运用各种Filter,开发者可以构建出更加灵活和精准的搜索系统。在深入研究源码和实践过程中,我们不仅能掌握Lucene的基本原理,还...
"Lucene group by" 指的就是在 Lucene 中实现基于特定字段的分组操作,类似于 SQL 中的 GROUP BY 子句。这使得用户能够按类别聚合文档,例如,根据作者、日期或其他分类标准来查看搜索结果。 在 Lucene 中,分组...
本文将深入探讨如何使用C#编程语言和Lucene.Net库来开发一个自定义的Web搜索引擎,帮助开发者了解这个过程中的关键技术和概念。 **一、C#基础** C#是微软推出的一种面向对象的编程语言,它具有丰富的类库和强大的...
结合以上文件,我们可以看到Lucene-2.0的学习不仅需要理解基本的索引构建和搜索原理,还需要掌握如何自定义排序规则和分析器以满足特定需求。此外,通过阅读和分析这些源码,开发者还可以深入理解Lucene的内部工作...
- **排序与评分**:根据相关性(如TF-IDF)对结果进行排序,可以自定义评分函数。 - **过滤与聚合**:通过Filter和QueryWrapperFilter实现特定条件筛选,如价格区间、品牌筛选等。 - **分页与翻页**:限制每次返回的...