`
lxwt909
  • 浏览: 573759 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

Lucene5学习之排序-Sort

阅读更多

       这回我们来学习Lucene的排序。机智的少年应该已经发现了,IndexSearcher类的search方法有好几个重载:

      

/** Finds the top <code>n</code>
   * hits for <code>query</code>.
   *
   * @throws BooleanQuery.TooManyClauses If a query would exceed 
   *         {@link BooleanQuery#getMaxClauseCount()} clauses.
   */
  public TopDocs search(Query query, int n)
    throws IOException {
    return search(query, null, n);
  }


  /** Finds the top <code>n</code>
   * hits for <code>query</code>, applying <code>filter</code> if non-null.
   *
   * @throws BooleanQuery.TooManyClauses If a query would exceed 
   *         {@link BooleanQuery#getMaxClauseCount()} clauses.
   */
  public TopDocs search(Query query, Filter filter, int n)
    throws IOException {
    return search(createNormalizedWeight(wrapFilter(query, filter)), null, n);
  }

  /** Lower-level search API.
   *
   * <p>{@link LeafCollector#collect(int)} is called for every matching
   * document.
   *
   * @param query to match documents
   * @param filter if non-null, used to permit documents to be collected.
   * @param results to receive hits
   * @throws BooleanQuery.TooManyClauses If a query would exceed 
   *         {@link BooleanQuery#getMaxClauseCount()} clauses.
   */
  public void search(Query query, Filter filter, Collector results)
    throws IOException {
    search(leafContexts, createNormalizedWeight(wrapFilter(query, filter)), results);
  }

  /** Lower-level search API.
   *
   * <p>{@link LeafCollector#collect(int)} is called for every matching document.
   *
   * @throws BooleanQuery.TooManyClauses If a query would exceed 
   *         {@link BooleanQuery#getMaxClauseCount()} clauses.
   */
  public void search(Query query, Collector results)
    throws IOException {
    search(leafContexts, createNormalizedWeight(query), results);
  }
  
  /** Search implementation with arbitrary sorting.  Finds
   * the top <code>n</code> hits for <code>query</code>, applying
   * <code>filter</code> if non-null, and sorting the hits by the criteria in
   * <code>sort</code>.
   * 
   * <p>NOTE: this does not compute scores by default; use
   * {@link IndexSearcher#search(Query,Filter,int,Sort,boolean,boolean)} to
   * control scoring.
   *
   * @throws BooleanQuery.TooManyClauses If a query would exceed 
   *         {@link BooleanQuery#getMaxClauseCount()} clauses.
   */
  public TopFieldDocs search(Query query, Filter filter, int n,
                             Sort sort) throws IOException {
    return search(createNormalizedWeight(wrapFilter(query, filter)), n, sort, false, false);
  }

  /** Search implementation with arbitrary sorting, plus
   * control over whether hit scores and max score
   * should be computed.  Finds
   * the top <code>n</code> hits for <code>query</code>, applying
   * <code>filter</code> if non-null, and sorting the hits by the criteria in
   * <code>sort</code>.  If <code>doDocScores</code> is <code>true</code>
   * then the score of each hit will be computed and
   * returned.  If <code>doMaxScore</code> is
   * <code>true</code> then the maximum score over all
   * collected hits will be computed.
   * 
   * @throws BooleanQuery.TooManyClauses If a query would exceed 
   *         {@link BooleanQuery#getMaxClauseCount()} clauses.
   */
  public TopFieldDocs search(Query query, Filter filter, int n,
                             Sort sort, boolean doDocScores, boolean doMaxScore) throws IOException {
    return search(createNormalizedWeight(wrapFilter(query, filter)), n, sort, doDocScores, doMaxScore);
  }

   query参数就不用解释了,filter用来再次过滤的,int n表示只返回Top N,Sort表示排序对象,

  doDocScores这个参数是重点,表示是否对文档进行相关性打分,如果你设为false,那你索引文档的score值就是NAN,

   doMaxScore表示啥意思呢,举个例子说明吧,假如你有两个Query(QueryA和QueryB),两个条件是通过BooleanQuery连接起来的,假如QueryA条件匹配到某个索引文档,而QueryB条件也同样匹配到该文档,如果doMaxScore设为true,表示该文档的评分计算规则为取两个Query(当然你可能会有N个Query链接,那就是N个Query中取最大值)之中的最大值,否则就是取两个Query查询评分的相加求和。默认为false.

   注意:在Lucene4.x时代,doDocScores和doMaxScore这两个参数可以通过indexSearcher类来设置,比如这样:

searcher.setDefaultFieldSortScoring(true, false);

   而在Lucene5.x时代,你只能在调用search方法时传入这两个参数,比如这样:

searcher.search(query, filter, n, sort, doDocScores, doMaxScore);

 

   看方法声明我们知道,我们如果需要改变默认的按评分降序排序行为,则必须传入一个Sort对象,那我们来观摩下Sort类源码:

    

package org.apache.lucene.search;

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.IOException;
import java.util.Arrays;


/**
 * Encapsulates sort criteria for returned hits.
 *
 * <p>The fields used to determine sort order must be carefully chosen.
 * Documents must contain a single term in such a field,
 * and the value of the term should indicate the document's relative position in
 * a given sort order.  The field must be indexed, but should not be tokenized,
 * and does not need to be stored (unless you happen to want it back with the
 * rest of your document data).  In other words:
 *
 * <p><code>document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));</code></p>
 * 
 *
 * <p><h3>Valid Types of Values</h3>
 *
 * <p>There are four possible kinds of term values which may be put into
 * sorting fields: Integers, Longs, Floats, or Strings.  Unless
 * {@link SortField SortField} objects are specified, the type of value
 * in the field is determined by parsing the first term in the field.
 *
 * <p>Integer term values should contain only digits and an optional
 * preceding negative sign.  Values must be base 10 and in the range
 * <code>Integer.MIN_VALUE</code> and <code>Integer.MAX_VALUE</code> inclusive.
 * Documents which should appear first in the sort
 * should have low value integers, later documents high values
 * (i.e. the documents should be numbered <code>1..n</code> where
 * <code>1</code> is the first and <code>n</code> the last).
 *
 * <p>Long term values should contain only digits and an optional
 * preceding negative sign.  Values must be base 10 and in the range
 * <code>Long.MIN_VALUE</code> and <code>Long.MAX_VALUE</code> inclusive.
 * Documents which should appear first in the sort
 * should have low value integers, later documents high values.
 * 
 * <p>Float term values should conform to values accepted by
 * {@link Float Float.valueOf(String)} (except that <code>NaN</code>
 * and <code>Infinity</code> are not supported).
 * Documents which should appear first in the sort
 * should have low values, later documents high values.
 *
 * <p>String term values can contain any valid String, but should
 * not be tokenized.  The values are sorted according to their
 * {@link Comparable natural order}.  Note that using this type
 * of term value has higher memory requirements than the other
 * two types.
 *
 * <p><h3>Object Reuse</h3>
 *
 * <p>One of these objects can be
 * used multiple times and the sort order changed between usages.
 *
 * <p>This class is thread safe.
 *
 * <p><h3>Memory Usage</h3>
 *
 * <p>Sorting uses of caches of term values maintained by the
 * internal HitQueue(s).  The cache is static and contains an integer
 * or float array of length <code>IndexReader.maxDoc()</code> for each field
 * name for which a sort is performed.  In other words, the size of the
 * cache in bytes is:
 *
 * <p><code>4 * IndexReader.maxDoc() * (# of different fields actually used to sort)</code>
 *
 * <p>For String fields, the cache is larger: in addition to the
 * above array, the value of every term in the field is kept in memory.
 * If there are many unique terms in the field, this could
 * be quite large.
 *
 * <p>Note that the size of the cache is not affected by how many
 * fields are in the index and <i>might</i> be used to sort - only by
 * the ones actually used to sort a result set.
 *
 * <p>Created: Feb 12, 2004 10:53:57 AM
 *
 * @since   lucene 1.4
 */
public class Sort {

  /**
   * Represents sorting by computed relevance. Using this sort criteria returns
   * the same results as calling
   * {@link IndexSearcher#search(Query,int) IndexSearcher#search()}without a sort criteria,
   * only with slightly more overhead.
   */
  public static final Sort RELEVANCE = new Sort();

  /** Represents sorting by index order. */
  public static final Sort INDEXORDER = new Sort(SortField.FIELD_DOC);

  // internal representation of the sort criteria
  SortField[] fields;

  /**
   * Sorts by computed relevance. This is the same sort criteria as calling
   * {@link IndexSearcher#search(Query,int) IndexSearcher#search()}without a sort criteria,
   * only with slightly more overhead.
   */
  public Sort() {
    this(SortField.FIELD_SCORE);
  }

  /** Sorts by the criteria in the given SortField. */
  public Sort(SortField field) {
    setSort(field);
  }

  /** Sets the sort to the given criteria in succession: the
   *  first SortField is checked first, but if it produces a
   *  tie, then the second SortField is used to break the tie,
   *  etc.  Finally, if there is still a tie after all SortFields
   *  are checked, the internal Lucene docid is used to break it. */
  public Sort(SortField... fields) {
    setSort(fields);
  }

  /** Sets the sort to the given criteria. */
  public void setSort(SortField field) {
    this.fields = new SortField[] { field };
  }

  /** Sets the sort to the given criteria in succession: the
   *  first SortField is checked first, but if it produces a
   *  tie, then the second SortField is used to break the tie,
   *  etc.  Finally, if there is still a tie after all SortFields
   *  are checked, the internal Lucene docid is used to break it. */
  public void setSort(SortField... fields) {
    this.fields = fields;
  }
  
  /**
   * Representation of the sort criteria.
   * @return Array of SortField objects used in this sort criteria
   */
  public SortField[] getSort() {
    return fields;
  }

  /**
   * Rewrites the SortFields in this Sort, returning a new Sort if any of the fields
   * changes during their rewriting.
   *
   * @param searcher IndexSearcher to use in the rewriting
   * @return {@code this} if the Sort/Fields have not changed, or a new Sort if there
   *        is a change
   * @throws IOException Can be thrown by the rewriting
   * @lucene.experimental
   */
  public Sort rewrite(IndexSearcher searcher) throws IOException {
    boolean changed = false;
    
    SortField[] rewrittenSortFields = new SortField[fields.length];
    for (int i = 0; i < fields.length; i++) {
      rewrittenSortFields[i] = fields[i].rewrite(searcher);
      if (fields[i] != rewrittenSortFields[i]) {
        changed = true;
      }
    }

    return (changed) ? new Sort(rewrittenSortFields) : this;
  }

  @Override
  public String toString() {
    StringBuilder buffer = new StringBuilder();

    for (int i = 0; i < fields.length; i++) {
      buffer.append(fields[i].toString());
      if ((i+1) < fields.length)
        buffer.append(',');
    }

    return buffer.toString();
  }

  /** Returns true if <code>o</code> is equal to this. */
  @Override
  public boolean equals(Object o) {
    if (this == o) return true;
    if (!(o instanceof Sort)) return false;
    final Sort other = (Sort)o;
    return Arrays.equals(this.fields, other.fields);
  }

  /** Returns a hash code value for this object. */
  @Override
  public int hashCode() {
    return 0x45aaf665 + Arrays.hashCode(fields);
  }

  /** Returns true if the relevance score is needed to sort documents. */
  public boolean needsScores() {
    for (SortField sortField : fields) {
      if (sortField.needsScores()) {
        return true;
      }
    }
    return false;
  }

}

    首先定义了两个静态常量:

   public static final Sort RELEVANCE = new Sort();

   public static final Sort INDEXORDER = new Sort(SortField.FIELD_DOC);

   RELEVANCE 表示按评分排序,

   INDEXORDER 表示按文档索引排序,什么叫按文档索引排序?意思是按照索引文档的docId排序,我们在创建索引文档的时候,Lucene默认会帮我们自动加一个Field(docId),如果你没有修改默认的排序行为,默认是先按照索引文档相关性评分降序排序(如果你开启了对索引文档打分功能的话),然后如果两个文档评分相同,再按照索引文档id升序排列。

   然后就是Sort的构造函数,你需要提供一个SortField对象,其中有一个构造函数要引起你们的注意:

public Sort(SortField... fields) {
    setSort(fields);
  }

 SortField... fields写法是JDK7引入的新语法,类似于以前的SortField[] fields写法,但它又于以前的这种写法有点不同,它支持field1,field2,field3,field4,field5,.........fieldN这种方式传参,当然你也可以传入一个数组也是可以的。其实我是想说Sort支持传入多个SortField即表示Sort是支持按多个域进行排序,就像SQL里的order by age,id,哦-哦,TM又扯远了,那接着去观摩下SoreField的源码,看看究竟:

public class SortField {

  /**
   * Specifies the type of the terms to be sorted, or special types such as CUSTOM
   */
  public static enum Type {

    /** Sort by document score (relevance).  Sort values are Float and higher
     * values are at the front. */
    SCORE,

    /** Sort by document number (index order).  Sort values are Integer and lower
     * values are at the front. */
    DOC,

    /** Sort using term values as Strings.  Sort values are String and lower
     * values are at the front. */
    STRING,

    /** Sort using term values as encoded Integers.  Sort values are Integer and
     * lower values are at the front. */
    INT,

    /** Sort using term values as encoded Floats.  Sort values are Float and
     * lower values are at the front. */
    FLOAT,

    /** Sort using term values as encoded Longs.  Sort values are Long and
     * lower values are at the front. */
    LONG,

    /** Sort using term values as encoded Doubles.  Sort values are Double and
     * lower values are at the front. */
    DOUBLE,

    /** Sort using a custom Comparator.  Sort values are any Comparable and
     * sorting is done according to natural order. */
    CUSTOM,

    /** Sort using term values as Strings, but comparing by
     * value (using String.compareTo) for all comparisons.
     * This is typically slower than {@link #STRING}, which
     * uses ordinals to do the sorting. */
    STRING_VAL,

    /** Sort use byte[] index values. */
    BYTES,

    /** Force rewriting of SortField using {@link SortField#rewrite(IndexSearcher)}
     * before it can be used for sorting */
    REWRITEABLE
  }

  /** Represents sorting by document score (relevance). */
  public static final SortField FIELD_SCORE = new SortField(null, Type.SCORE);

  /** Represents sorting by document number (index order). */
  public static final SortField FIELD_DOC = new SortField(null, Type.DOC);

  private String field;
  private Type type;  // defaults to determining type dynamically
  boolean reverse = false;  // defaults to natural order

  // Used for CUSTOM sort
  private FieldComparatorSource comparatorSource;

 首先你看到的里面定义了一个排序规则类型的枚举Type,

   SCORE:表示按评分排序,默认是降序

   DOC:按文档ID排序,除了评分默认是降序以外,其他默认都是升序

   STRING:表示把域的值转成字符串进行排序,

   STRING_VAL也是把域的值转成字符串进行排序,不过比较的时候是调用String.compareTo来比较的

   STRING_VAL性能比STRING要差,STRING是怎么比较的,源码里没有说明。

   相应的还有INT,FLOAT,DOUBLE,LONG就不多说了,

  CUSTOM:表示自定义排序,这个是要结合下面的成员变量

   private FieldComparatorSource comparatorSource;一起使用,即指定一个自己的自定义的比较器,通过自己的比较器来决定排序顺序。

   SortField还有3个比较重要的成员变量,除了刚才的说自定义比较器外:

  private String field;
  private Type type;  // defaults to determining type dynamically
  boolean reverse = false;  // defaults to natural order

 毫无疑问,Field表示你要对哪个域进行排序,即排序域名称

Type即上面解释过的排序规则即按什么来排序,评分 or docID 等等

reverse表示是否反转默认的排序行为,即升序变降序,降序就变升序,比如默认评分是降序的,reverse设为true,则默认评分就按升序排序了,而其他域就按升序排序了。默认reverse为false.

OK,了解以上内容,我想大家已经对如何实现自己对索引文档的自定义排序已经了然于胸了。下面我把我写的测试demo代码贴出来供大家参考:

   首先创建用于测试的索引文档:

   

package com.yida.framework.lucene5.sort;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Properties;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.DateTools;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.IntField;
import org.apache.lucene.document.NumericDocValuesField;
import org.apache.lucene.document.SortedDocValuesField;
import org.apache.lucene.document.SortedNumericDocValuesField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.BytesRef;
/**
 * 创建测试索引
 * @author Lanxiaowei
 *
 */
public class CreateTestIndex {
	public static void main(String[] args) throws IOException {
		String dataDir = "C:/data";
		String indexDir = "C:/lucenedir";

		Directory dir = FSDirectory.open(Paths.get(indexDir));
		Analyzer analyzer = new StandardAnalyzer();
		IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
		indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
		IndexWriter writer = new IndexWriter(dir, indexWriterConfig);

		List<File> results = new ArrayList<File>();
		findFiles(results, new File(dataDir));
		System.out.println(results.size() + " books to index");

		for (File file : results) {
			Document doc = getDocument(dataDir, file);
			writer.addDocument(doc);
		}
		writer.close();
		dir.close();

	}

	/**
	 * 查找指定目录下的所有properties文件
	 * 
	 * @param result
	 * @param dir
	 */
	private static void findFiles(List<File> result, File dir) {
		for (File file : dir.listFiles()) {
			if (file.getName().endsWith(".properties")) {
				result.add(file);
			} else if (file.isDirectory()) {
				findFiles(result, file);
			}
		}
	}

	/**
	 * 读取properties文件生成Document
	 * 
	 * @param rootDir
	 * @param file
	 * @return
	 * @throws IOException
	 */
	public static Document getDocument(String rootDir, File file)
			throws IOException {
		Properties props = new Properties();
		props.load(new FileInputStream(file));

		Document doc = new Document();

		String category = file.getParent().substring(rootDir.length());
		category = category.replace(File.separatorChar, '/');

		String isbn = props.getProperty("isbn");
		String title = props.getProperty("title");
		String author = props.getProperty("author");
		String url = props.getProperty("url");
		String subject = props.getProperty("subject");

		String pubmonth = props.getProperty("pubmonth");

		System.out.println("title:" + title + "\n" + "author:" + author + "\n" + "subject:" + subject + "\n"
				+ "pubmonth:" + pubmonth + "\n" + "category:" + category + "\n---------");

		doc.add(new StringField("isbn", isbn, Field.Store.YES));
		doc.add(new StringField("category", category, Field.Store.YES));
		doc.add(new SortedDocValuesField("category", new BytesRef(category)));
		doc.add(new TextField("title", title, Field.Store.YES));
		doc.add(new Field("title2", title.toLowerCase(), Field.Store.YES,
				Field.Index.NOT_ANALYZED_NO_NORMS,
				Field.TermVector.WITH_POSITIONS_OFFSETS));

		String[] authors = author.split(",");
		for (String a : authors) {
			doc.add(new Field("author", a, Field.Store.YES,
					Field.Index.NOT_ANALYZED,
					Field.TermVector.WITH_POSITIONS_OFFSETS));
		}

		doc.add(new Field("url", url, Field.Store.YES,
				Field.Index.NOT_ANALYZED_NO_NORMS));
		doc.add(new Field("subject", subject, Field.Store.YES,
				Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));

		doc.add(new IntField("pubmonth", Integer.parseInt(pubmonth),
				Field.Store.YES));
		doc.add(new NumericDocValuesField("pubmonth", Integer.parseInt(pubmonth)));
		Date d = null;
		try {
			d = DateTools.stringToDate(pubmonth);
		} catch (ParseException pe) {
			throw new RuntimeException(pe);
		}
		doc.add(new IntField("pubmonthAsDay",
				(int) (d.getTime() / (1000 * 3600 * 24)), Field.Store.YES));
		
		for (String text : new String[] { title, subject, author, category }) {
			doc.add(new Field("contents", text, Field.Store.NO,
					Field.Index.ANALYZED,
					Field.TermVector.WITH_POSITIONS_OFFSETS));
		}
		return doc;
	}

}

 不要问我为什么上面创建索引还要用已经提示快要被废弃了的Field类呢,我会告诉你:我任性!!!不要在意那些细节,我只是想变着花样玩玩。其实就是读取data文件夹下的所有properties文件然后读取文件里的数据写入索引。我待会儿会在底下附件里上传测试用的properties数据文件。

然后就是编写测试类进行测试:

package com.yida.framework.lucene5.sort;

import java.io.IOException;
import java.io.PrintStream;
import java.nio.file.Paths;
import java.text.DecimalFormat;

import org.apache.commons.lang.StringUtils;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.SortField.Type;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class SortingExample {
	private Directory directory;

	public SortingExample(Directory directory) {
		this.directory = directory;
	}
	
	public void displayResults(Query query, Sort sort)
			throws IOException {
		IndexReader reader = DirectoryReader.open(directory);
		IndexSearcher searcher = new IndexSearcher(reader);

		//searcher.setDefaultFieldSortScoring(true, false);
		
		//Lucene5.x把是否评分的两个参数放到方法入参里来进行设置
		//searcher.search(query, filter, n, sort, doDocScores, doMaxScore);
		TopDocs results = searcher.search(query, null, 
				20, sort,true,false); 

		System.out.println("\nResults for: " + 
				query.toString() + " sorted by " + sort);

		System.out
				.println(StringUtils.rightPad("Title", 30)
						+ StringUtils.rightPad("pubmonth", 10)
						+ StringUtils.center("id", 4)
						+ StringUtils.center("score", 15));
		PrintStream out = new PrintStream(System.out, true, "UTF-8");

		DecimalFormat scoreFormatter = new DecimalFormat("0.######");
		for (ScoreDoc sd : results.scoreDocs) {
			int docID = sd.doc;
			float score = sd.score;
			Document doc = searcher.doc(docID);
			out.println(StringUtils.rightPad( 
					StringUtils.abbreviate(doc.get("title"), 29), 30) + 
					StringUtils.rightPad(doc.get("pubmonth"), 10) + 
					StringUtils.center("" + docID, 4) + 
					StringUtils.leftPad( 
							scoreFormatter.format(score), 12)); 
			out.println("   " + doc.get("category"));
			// out.println(searcher.explain(query, docID)); 
		}
		System.out.println("\n**************************************\n");
		reader.close();
	}

	public static void main(String[] args) throws Exception {
		String indexdir = "C:/lucenedir";
		Query allBooks = new MatchAllDocsQuery();

		QueryParser parser = new QueryParser("contents",new StandardAnalyzer()); 
		BooleanQuery query = new BooleanQuery(); 
		query.add(allBooks, BooleanClause.Occur.SHOULD); 
		query.add(parser.parse("java OR action"), BooleanClause.Occur.SHOULD); 

		Directory directory = FSDirectory.open(Paths.get(indexdir));
		SortingExample example = new SortingExample(directory); 

		example.displayResults(query, Sort.RELEVANCE);

		example.displayResults(query, Sort.INDEXORDER);

		example.displayResults(query, new Sort(new SortField("category",
				Type.STRING)));

		example.displayResults(query, new Sort(new SortField("pubmonth",
				Type.INT, true)));

		example.displayResults(query, new Sort(new SortField("category",
				Type.STRING), SortField.FIELD_SCORE, new SortField(
				"pubmonth", Type.INT, true)));

		example.displayResults(query, new Sort(new SortField[] {
				SortField.FIELD_SCORE,
				new SortField("category", Type.STRING) }));
		directory.close();
	}
}

 理解清楚了我上面说的那些知识点,我想这些测试代码你们应该看得懂,不过我还是要提醒一点,在new Sort对象时,可以传入多个SortField来支持多域排序,比如:

    

new Sort(new SortField("category",
				Type.STRING), SortField.FIELD_SCORE, new SortField(
				"pubmonth", Type.INT, true))

 表示先按category域按字符串升序排,再按评分降序排,接着按pubmonth域进行数字比较后降序排,一句话,域的排序顺序跟你StoreField定义的先后顺序保持一致。注意Sort的默认排序行为。

     下面是运行后的打印结果,你们请对照这打印结构和代码多理解酝酿下吧:

Results for: *:* (contents:java contents:action) sorted by <score>
Title                         pubmonth   id      score     
Ant in Action                 200707     6      1.052735
   /technology/computers/programming
Lucene in Action, Second E... 201005     9      1.052735
   /technology/computers/programming
Tapestry in Action            200403     11     0.447534
   /technology/computers/programming
JUnit in Action, Second Ed... 201005     8      0.429442
   /technology/computers/programming
A Modern Art of Education     200403     0      0.151398
   /education/pedagogy
Imperial Secrets of Health... 199903     1      0.151398
   /health/alternative/chinese
Lipitor, Thief of Memory      200611     2      0.151398
   /health
Nudge: Improving Decisions... 200804     3      0.151398
   /health
Tao Te Ching 道德經              200609     4      0.151398
   /philosophy/eastern
Gödel, Escher, Bach: an Et... 199905     5      0.151398
   /technology/computers/ai
Mindstorms: Children, Comp... 199307     7      0.151398
   /technology/computers/programming/education
Extreme Programming Explained 200411     10     0.151398
   /technology/computers/programming/methodology
The Pragmatic Programmer      199910     12     0.151398
   /technology/computers/programming

**************************************


Results for: *:* (contents:java contents:action) sorted by <doc>
Title                         pubmonth   id      score     
A Modern Art of Education     200403     0      0.151398
   /education/pedagogy
Imperial Secrets of Health... 199903     1      0.151398
   /health/alternative/chinese
Lipitor, Thief of Memory      200611     2      0.151398
   /health
Nudge: Improving Decisions... 200804     3      0.151398
   /health
Tao Te Ching 道德經              200609     4      0.151398
   /philosophy/eastern
Gödel, Escher, Bach: an Et... 199905     5      0.151398
   /technology/computers/ai
Ant in Action                 200707     6      1.052735
   /technology/computers/programming
Mindstorms: Children, Comp... 199307     7      0.151398
   /technology/computers/programming/education
JUnit in Action, Second Ed... 201005     8      0.429442
   /technology/computers/programming
Lucene in Action, Second E... 201005     9      1.052735
   /technology/computers/programming
Extreme Programming Explained 200411     10     0.151398
   /technology/computers/programming/methodology
Tapestry in Action            200403     11     0.447534
   /technology/computers/programming
The Pragmatic Programmer      199910     12     0.151398
   /technology/computers/programming

**************************************


Results for: *:* (contents:java contents:action) sorted by <string: "category">
Title                         pubmonth   id      score     
A Modern Art of Education     200403     0      0.151398
   /education/pedagogy
Lipitor, Thief of Memory      200611     2      0.151398
   /health
Nudge: Improving Decisions... 200804     3      0.151398
   /health
Imperial Secrets of Health... 199903     1      0.151398
   /health/alternative/chinese
Tao Te Ching 道德經              200609     4      0.151398
   /philosophy/eastern
Gödel, Escher, Bach: an Et... 199905     5      0.151398
   /technology/computers/ai
Ant in Action                 200707     6      1.052735
   /technology/computers/programming
JUnit in Action, Second Ed... 201005     8      0.429442
   /technology/computers/programming
Lucene in Action, Second E... 201005     9      1.052735
   /technology/computers/programming
Tapestry in Action            200403     11     0.447534
   /technology/computers/programming
The Pragmatic Programmer      199910     12     0.151398
   /technology/computers/programming
Mindstorms: Children, Comp... 199307     7      0.151398
   /technology/computers/programming/education
Extreme Programming Explained 200411     10     0.151398
   /technology/computers/programming/methodology

**************************************


Results for: *:* (contents:java contents:action) sorted by <int: "pubmonth">!
Title                         pubmonth   id      score     
JUnit in Action, Second Ed... 201005     8      0.429442
   /technology/computers/programming
Lucene in Action, Second E... 201005     9      1.052735
   /technology/computers/programming
Nudge: Improving Decisions... 200804     3      0.151398
   /health
Ant in Action                 200707     6      1.052735
   /technology/computers/programming
Lipitor, Thief of Memory      200611     2      0.151398
   /health
Tao Te Ching 道德經              200609     4      0.151398
   /philosophy/eastern
Extreme Programming Explained 200411     10     0.151398
   /technology/computers/programming/methodology
A Modern Art of Education     200403     0      0.151398
   /education/pedagogy
Tapestry in Action            200403     11     0.447534
   /technology/computers/programming
The Pragmatic Programmer      199910     12     0.151398
   /technology/computers/programming
Gödel, Escher, Bach: an Et... 199905     5      0.151398
   /technology/computers/ai
Imperial Secrets of Health... 199903     1      0.151398
   /health/alternative/chinese
Mindstorms: Children, Comp... 199307     7      0.151398
   /technology/computers/programming/education

**************************************


Results for: *:* (contents:java contents:action) sorted by <string: "category">,<score>,<int: "pubmonth">!
Title                         pubmonth   id      score     
A Modern Art of Education     200403     0      0.151398
   /education/pedagogy
Nudge: Improving Decisions... 200804     3      0.151398
   /health
Lipitor, Thief of Memory      200611     2      0.151398
   /health
Imperial Secrets of Health... 199903     1      0.151398
   /health/alternative/chinese
Tao Te Ching 道德經              200609     4      0.151398
   /philosophy/eastern
Gödel, Escher, Bach: an Et... 199905     5      0.151398
   /technology/computers/ai
Lucene in Action, Second E... 201005     9      1.052735
   /technology/computers/programming
Ant in Action                 200707     6      1.052735
   /technology/computers/programming
Tapestry in Action            200403     11     0.447534
   /technology/computers/programming
JUnit in Action, Second Ed... 201005     8      0.429442
   /technology/computers/programming
The Pragmatic Programmer      199910     12     0.151398
   /technology/computers/programming
Mindstorms: Children, Comp... 199307     7      0.151398
   /technology/computers/programming/education
Extreme Programming Explained 200411     10     0.151398
   /technology/computers/programming/methodology

**************************************


Results for: *:* (contents:java contents:action) sorted by <score>,<string: "category">
Title                         pubmonth   id      score     
Ant in Action                 200707     6      1.052735
   /technology/computers/programming
Lucene in Action, Second E... 201005     9      1.052735
   /technology/computers/programming
Tapestry in Action            200403     11     0.447534
   /technology/computers/programming
JUnit in Action, Second Ed... 201005     8      0.429442
   /technology/computers/programming
A Modern Art of Education     200403     0      0.151398
   /education/pedagogy
Lipitor, Thief of Memory      200611     2      0.151398
   /health
Nudge: Improving Decisions... 200804     3      0.151398
   /health
Imperial Secrets of Health... 199903     1      0.151398
   /health/alternative/chinese
Tao Te Ching 道德經              200609     4      0.151398
   /philosophy/eastern
Gödel, Escher, Bach: an Et... 199905     5      0.151398
   /technology/computers/ai
The Pragmatic Programmer      199910     12     0.151398
   /technology/computers/programming
Mindstorms: Children, Comp... 199307     7      0.151398
   /technology/computers/programming/education
Extreme Programming Explained 200411     10     0.151398
   /technology/computers/programming/methodology

**************************************

    写的比较匆忙,如果有哪里没有说清楚或说的不对的,请尽情的喷我,谢谢!

    demo源码我也会上传到底下的附件里,你们运行测试类的时候,记得把测试用的数据文件copy到C盘下,如图:


 

    OK,打完收工!

   

      如果你还有什么问题请加我Q-Q:7-3-6-0-3-1-3-0-5,

或者加裙
一起交流学习!

   

  

 

  • 大小: 236.2 KB
1
0
分享到:
评论
1 楼 iEpac 2015-12-10  
不定长参数在jdk5就有了, 什么jdk7啊

相关推荐

    Lucene5学习之自定义排序

    本文将深入探讨“Lucene5学习之自定义排序”这一主题,帮助你理解如何在Lucene5中实现自定义的排序规则。 首先,Lucene的核心功能之一就是提供高效的全文检索能力,但默认的搜索结果排序通常是基于相关度得分...

    一步一步跟我学习lucene(12)---lucene搜索之分组处理group查询

    在"一步一步跟我学习lucene(12)---lucene搜索之分组处理group查询"中,我们将重点关注如何利用Lucene实现这一高级搜索功能。 首先,Lucene是一个开源全文搜索引擎库,它为Java开发者提供了构建高效、可扩展的搜索...

    【分享:lucene学习资料】---<下载不扣分,回帖加1分,欢迎下载,童叟无欺>

    1&gt; lucene学习笔记 2&gt; 全文检索的实现机制 【1】lucene学习笔记的目录如下 1. 概述 3 2. lucene 的包结构 3 3. 索引文件格式 3 4. lucene中主要的类 4 4.1. Document文档类 4 4.1.1. 常用方法 4 4.1.2. 示例 4 4.2...

    lucene自定义排序实现

    因此,了解如何在 Lucene 中实现自定义排序是非常关键的。在这个话题中,我们将深入探讨如何根据特定的业务需求对搜索结果进行定制排序。 首先,我们要明白 Lucene 默认的排序机制。默认情况下,Lucene 搜索结果是...

    SSH + Lucene + 分页 + 排序 + 高亮 模拟简单新闻网站搜索引擎--data

    - 在查询时,通过设置Sort对象指定排序依据和顺序。 5. 高亮显示: - Lucene提供Highlighter类来高亮搜索结果中的关键词。首先,使用Analyzer分析文档内容,然后使用Highlighter找出关键词出现的位置,并用特定...

    Lucene-2.0学习文档

    在Lucene中,`Sort`类用于控制搜索结果的排序方式,而`SortComparatorSource`是其内部组件,用于提供排序比较器。`MySortComparatorSource.java`可能是用户自定义的排序比较器源,它可以实现特定的排序逻辑,比如...

    Lucene5学习之Filter过滤器

    《深入理解Lucene5:Filter过滤器的奥秘》 在全文搜索引擎的开发过程中,Lucene作为一款强大的开源搜索引擎库,扮演着至关重要的角色。它提供了丰富的功能,使得开发者能够快速构建高效的搜索系统。其中,Filter...

    Lucene3.3.0学习Demo

    - 结果排序的示例,可能包含如何自定义`Sort`对象以改变默认的排序方式。 通过这些Demo,你可以逐步理解Lucene的工作原理,学习如何在实际项目中集成和优化搜索功能。此外,3.3.0版本可能还涉及一些当时的特性,...

    lucene排序.zip

    《深入理解Lucene排序机制:从关键词频率到自定义优先级》 在信息检索领域,Lucene是一个广泛使用的全文搜索引擎库。它提供了强大的文本分析、索引和搜索功能,而排序作为搜索结果的重要组成部分,是Lucene的一个...

    lucene排序、设置权重、优化、分布式搜索.pdf

    Lucene 排序、设置权重、优化、分布式搜索 Lucene 是一个高性能的搜索引擎库,它提供了强大的文本搜索和索引能力。下面我们将详细介绍 Lucene 的排序、设置权重、优化和分布式搜索等知识点。 一、Lucene 排序 ...

    Lucene0之结果排序.pdf

    向量空间模型是Lucene排序算法的基础,由Gerald Salton等人在30多年前提出。该模型假设文档和查询的相关性可以通过它们共有的词汇来衡量。经典的TF-IDF(词频-逆文档频率)公式用于计算词项权重。文档d和查询q的...

    lucene-2.3.0-src

    3. **搜索模块**:`Searcher`执行查询,`Scorer`计算相关性分数,`Sort`定义结果排序规则。 4. **存储模块**:`RAMDirectory`和`FSDirectory`提供内存和文件系统上的索引存储。 三、主要特性 1. **多线程支持**:...

    lucene-4.6.1官方文档

    《Apache Lucene 4.6.1 官方文档详解》 Apache Lucene 是一个...通过深入学习Apache Lucene 4.6.1的官方文档,开发者可以掌握搜索引擎的核心原理,以及如何在实际项目中有效地使用Lucene,实现高效、精准的信息检索。

    lucene-core-2.4.0的源码

    2. **排序(Sort)**:Lucene 2.4.0支持按字段值排序,可以是基于评分的排序或自定义的排序方式。 3. **分面搜索(Faceting)**:虽然2.4.0版本未直接支持,但可以通过自定义实现统计特定字段的频次,实现简单的分...

    Lucene4.X实战类baidu搜索的大型文档海量搜索系统-18.Lucene排序 共6页.pptx

    在搜索引擎的世界里,正确的排序机制是提升用户体验的关键因素之一。Apache Lucene,作为一款强大的全文检索库,提供了丰富的功能,包括对搜索结果进行精确且高效的排序。在Lucene4.x版本中,排序功能有了新的变化,...

    lucene-4.9.1

    8. 排序(Sort):定义搜索结果的排序方式,可以基于评分、字段值等。 总之,Apache Lucene 4.9.1提供了强大的全文搜索功能,是构建高效搜索引擎的基础。通过深入理解其核心模块,开发者可以利用这个库构建出满足...

    Lucene简单应用

    - 探讨Lucene中排序的相关概念和技术,如ScoreDoc、Sort、SortField等。 - 了解如何根据不同的字段对搜索结果进行排序,以及如何实现自定义排序逻辑。 #### 八、过滤 - 过滤器(Filter)允许开发者在查询结果中...

Global site tag (gtag.js) - Google Analytics