`
lxwt909
  • 浏览: 572246 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

Lucene5学习之Facet(续)

阅读更多

          默认Facet是统计落入某一组域值的总数的,然后按照总数从大到小排序,判定规则是域值是否相同,其实还可以根据域值是否在某个范围内来判定是否落入某一个分组。这里说的范围就是通过Range定义的,比如:

        /**1小时之前的毫秒数*/
	final LongRange PAST_HOUR = new LongRange("Past hour", this.nowSec - 3600L,
			true, this.nowSec, true);
	/**6小时之前的毫秒数*/
	final LongRange PAST_SIX_HOURS = new LongRange("Past six hours",
			this.nowSec - 21600L, true, this.nowSec, true);
	/**24小时之前的毫秒数*/
	final LongRange PAST_DAY = new LongRange("Past day", this.nowSec - 86400L,
			true, this.nowSec, true);

    这里定义了3个数字范围,比如1个小时之前的毫秒数至当前时间毫秒数之间的范围,剩余两个类似,然后就可以根据定义的范围来创建Facet:

         //定义3个Facet: 统计 
	//[过去一小时之前-->当前时间]  [过去6小时之前-->当前时间]  [过去24小时之前-->当前时间]
	Facets facets = new LongRangeFacetCounts("timestamp", fc,
	    new LongRange[] { this.PAST_HOUR, this.PAST_SIX_HOURS,this.PAST_DAY });

   这样就能根据自定义范围去统计了,只要当前域的域值落在这个范围内,则这个facet分组内计数加1.完整示例代码如下:

package com.yida.framework.lucene5.facet;

import java.io.IOException;

import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.LongField;
import org.apache.lucene.document.NumericDocValuesField;
import org.apache.lucene.facet.DrillDownQuery;
import org.apache.lucene.facet.FacetResult;
import org.apache.lucene.facet.Facets;
import org.apache.lucene.facet.FacetsCollector;
import org.apache.lucene.facet.FacetsConfig;
import org.apache.lucene.facet.range.LongRange;
import org.apache.lucene.facet.range.LongRangeFacetCounts;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class RangeFacetsExample {
	private final Directory indexDir = new RAMDirectory();
	private IndexSearcher searcher;
	/**当前时间的毫秒数*/
	private final long nowSec = System.currentTimeMillis();

	/**1小时之前的毫秒数*/
	final LongRange PAST_HOUR = new LongRange("Past hour", this.nowSec - 3600L,
			true, this.nowSec, true);
	/**6小时之前的毫秒数*/
	final LongRange PAST_SIX_HOURS = new LongRange("Past six hours",
			this.nowSec - 21600L, true, this.nowSec, true);
	/**24小时之前的毫秒数*/
	final LongRange PAST_DAY = new LongRange("Past day", this.nowSec - 86400L,
			true, this.nowSec, true);

	/**
	 * 创建测试索引
	 * @throws IOException
	 */
	public void index() throws IOException {
		IndexWriter indexWriter = new IndexWriter(this.indexDir,
				new IndexWriterConfig(new WhitespaceAnalyzer())
						.setOpenMode(IndexWriterConfig.OpenMode.CREATE));

		/**
		 * 每次按[1000*i]这个斜率递减创建一个索引
		 */
		for (int i = 0; i < 100; i++) {
			Document doc = new Document();
			long then = this.nowSec - i * 1000;

			doc.add(new NumericDocValuesField("timestamp", then));

			doc.add(new LongField("timestamp", then, Field.Store.YES));
			indexWriter.addDocument(doc);
		}
		
		this.searcher = new IndexSearcher(DirectoryReader.open(indexWriter,
				true));
		indexWriter.close();
	}

	/**
	 * 获取FacetConfig配置对象
	 * @return
	 */
	private FacetsConfig getConfig() {
		return new FacetsConfig();
	}

	public FacetResult search() throws IOException {
		/**创建Facet结果收集器*/
		FacetsCollector fc = new FacetsCollector();
		TopDocs topDocs = FacetsCollector.search(this.searcher, new MatchAllDocsQuery(), 20, fc);
		ScoreDoc[] scoreDocs = topDocs.scoreDocs;
		for(ScoreDoc scoreDoc : scoreDocs) {
			int docId = scoreDoc.doc;
			Document doc = searcher.doc(docId);
			System.out.println(scoreDoc.doc + "\t" + doc.get("timestamp"));
		}
		//定义3个Facet: 统计 
		//[过去一小时之前-->当前时间]  [过去6小时之前-->当前时间]  [过去24小时之前-->当前时间]
		Facets facets = new LongRangeFacetCounts("timestamp", fc,
				new LongRange[] { this.PAST_HOUR, this.PAST_SIX_HOURS,
						this.PAST_DAY });
		
		return facets.getTopChildren(10, "timestamp", new String[0]);
	}

	/**
	 * 使用DrillDownQuery进行Facet统计
	 * @param range
	 * @return
	 * @throws IOException
	 */
	public TopDocs drillDown(LongRange range) throws IOException {
		DrillDownQuery q = new DrillDownQuery(getConfig());

		q.add("timestamp", NumericRangeQuery.newLongRange("timestamp",
				Long.valueOf(range.min), Long.valueOf(range.max),
				range.minInclusive, range.maxInclusive));

		return this.searcher.search(q, 10);
	}

	public void close() throws IOException {
		this.searcher.getIndexReader().close();
		this.indexDir.close();
	}

	public static void main(String[] args) throws Exception {
		RangeFacetsExample example = new RangeFacetsExample();
		example.index();

		System.out.println("Facet counting example:");
		System.out.println("-----------------------");
		System.out.println(example.search());

		System.out.println("\n");
		
		//只统计6个小时之前的Facet
		System.out
				.println("Facet drill-down example (timestamp/Past six hours):");
		System.out.println("---------------------------------------------");
		TopDocs hits = example.drillDown(example.PAST_SIX_HOURS);
		System.out.println(hits.totalHits + " totalHits");

		example.close();
	}
}

   我们在通过FacetsCollector.search进行搜索时,其实是可以传入排序器的,这时我们在创建索引时就需要使用SortedSetDocValuesFacetField,这样FacetsCollector.search返回的TopDocs就是经过排序的了,这一点需要注意。具体示例代码如下:

package com.yida.framework.lucene5.facet;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.facet.DrillDownQuery;
import org.apache.lucene.facet.FacetResult;
import org.apache.lucene.facet.Facets;
import org.apache.lucene.facet.FacetsCollector;
import org.apache.lucene.facet.FacetsConfig;
import org.apache.lucene.facet.sortedset.DefaultSortedSetDocValuesReaderState;
import org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts;
import org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetField;
import org.apache.lucene.facet.sortedset.SortedSetDocValuesReaderState;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.SortField.Type;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
public class SimpleSortedSetFacetsExample {
	private final Directory indexDir = new RAMDirectory();
	private final FacetsConfig config = new FacetsConfig();

	private void index() throws IOException {
		IndexWriter indexWriter = new IndexWriter(this.indexDir,
				new IndexWriterConfig(new WhitespaceAnalyzer())
						.setOpenMode(IndexWriterConfig.OpenMode.CREATE));

		Document doc = new Document();
		doc.add(new SortedSetDocValuesFacetField("Author", "Bob"));
		doc.add(new SortedSetDocValuesFacetField("Publish Year", "2010"));
		indexWriter.addDocument(this.config.build(doc));

		doc = new Document();
		doc.add(new SortedSetDocValuesFacetField("Author", "Lisa"));
		doc.add(new SortedSetDocValuesFacetField("Publish Year", "2010"));
		indexWriter.addDocument(this.config.build(doc));

		doc = new Document();
		doc.add(new SortedSetDocValuesFacetField("Author", "Lisa"));
		doc.add(new SortedSetDocValuesFacetField("Publish Year", "2012"));
		indexWriter.addDocument(this.config.build(doc));

		doc = new Document();
		doc.add(new SortedSetDocValuesFacetField("Author", "Susan"));
		doc.add(new SortedSetDocValuesFacetField("Publish Year", "2012"));
		indexWriter.addDocument(this.config.build(doc));

		doc = new Document();
		doc.add(new SortedSetDocValuesFacetField("Author", "Frank"));
		doc.add(new SortedSetDocValuesFacetField("Publish Year", "1999"));
		indexWriter.addDocument(this.config.build(doc));

		indexWriter.close();
	}

	private List<FacetResult> search() throws IOException {
		DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
		IndexSearcher searcher = new IndexSearcher(indexReader);
		SortedSetDocValuesReaderState state = new DefaultSortedSetDocValuesReaderState(
				indexReader);

		FacetsCollector fc = new FacetsCollector();
		
		Sort sort = new Sort(new SortField("Author", Type.INT, false));
		//FacetsCollector.search(searcher, new MatchAllDocsQuery(), 10, fc);
		TopDocs topDocs = FacetsCollector.search(searcher,new MatchAllDocsQuery(),null,10,sort,true,false,fc);

		Facets facets = new SortedSetDocValuesFacetCounts(state, fc);

		List<FacetResult> results = new ArrayList<FacetResult>();
		results.add(facets.getTopChildren(10, "Author", new String[0]));
		results.add(facets.getTopChildren(10, "Publish Year", new String[0]));
		indexReader.close();

		return results;
	}

	private FacetResult drillDown() throws IOException {
		DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
		IndexSearcher searcher = new IndexSearcher(indexReader);
		SortedSetDocValuesReaderState state = new DefaultSortedSetDocValuesReaderState(
				indexReader);

		DrillDownQuery q = new DrillDownQuery(this.config);
		q.add("Publish Year", new String[] { "2010" });
		FacetsCollector fc = new FacetsCollector();
		FacetsCollector.search(searcher, q, 10, fc);

		Facets facets = new SortedSetDocValuesFacetCounts(state, fc);
		FacetResult result = facets.getTopChildren(10, "Author", new String[0]);
		indexReader.close();

		return result;
	}

	public List<FacetResult> runSearch() throws IOException {
		index();
		return search();
	}

	public FacetResult runDrillDown() throws IOException {
		index();
		return drillDown();
	}

	public static void main(String[] args) throws Exception {
		System.out.println("Facet counting example:");
		System.out.println("-----------------------");
		SimpleSortedSetFacetsExample example = new SimpleSortedSetFacetsExample();
		List<FacetResult> results = example.runSearch();
		System.out.println("Author: " + results.get(0));
		System.out.println("Publish Year: " + results.get(0));

		System.out.println("\n");
		System.out.println("Facet drill-down example (Publish Year/2010):");
		System.out.println("---------------------------------------------");
		System.out.println("Author: " + example.runDrillDown());
	}
}

    前面我们在定义Facet时,即可以不指定域名,比如:

Facets facets = new FastTaxonomyFacetCounts(taxoReader, this.config, fc);

    这时默认会根据我们定义的FacetField去决定定义几个Facet进行统计,你可以指定只对指定域定义Facet,这样就可以值返回你关心的Facet统计数据,比如:

Facets author = new FastTaxonomyFacetCounts("author", taxoReader,
				this.config, fc);

    具体完整示例代码如下:

package com.yida.framework.lucene5.facet;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.facet.FacetField;
import org.apache.lucene.facet.FacetResult;
import org.apache.lucene.facet.Facets;
import org.apache.lucene.facet.FacetsCollector;
import org.apache.lucene.facet.FacetsConfig;
import org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts;
import org.apache.lucene.facet.taxonomy.TaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
/**
 * 根据指定域加载特定Facet Count
 * @author Lanxiaowei
 *
 */
public class MultiCategoryListsFacetsExample {
	private final Directory indexDir = new RAMDirectory();
	private final Directory taxoDir = new RAMDirectory();
	private final FacetsConfig config = new FacetsConfig();

	public MultiCategoryListsFacetsExample() {
		//定义  域别名
		this.config.setIndexFieldName("Author", "author");
		this.config.setIndexFieldName("Publish Date", "pubdate");
		
		// 设置Publish Date为多值域
		this.config.setHierarchical("Publish Date", true);
	}

	/**
	 * 创建测试索引
	 * @throws IOException
	 */
	private void index() throws IOException {
		IndexWriter indexWriter = new IndexWriter(this.indexDir,
				new IndexWriterConfig(new WhitespaceAnalyzer())
						.setOpenMode(IndexWriterConfig.OpenMode.CREATE));

		DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(
				this.taxoDir);

		Document doc = new Document();
		doc.add(new FacetField("Author", new String[] { "Bob" }));
		doc.add(new FacetField("Publish Date", new String[] { "2010", "10",
				"15" }));
		indexWriter.addDocument(this.config.build(taxoWriter, doc));

		doc = new Document();
		doc.add(new FacetField("Author", new String[] { "Lisa" }));
		doc.add(new FacetField("Publish Date", new String[] { "2010", "10",
				"20" }));
		indexWriter.addDocument(this.config.build(taxoWriter, doc));

		doc = new Document();
		doc.add(new FacetField("Author", new String[] { "Lisa" }));
		doc.add(new FacetField("Publish Date",
				new String[] { "2012", "1", "1" }));
		indexWriter.addDocument(this.config.build(taxoWriter, doc));

		doc = new Document();
		doc.add(new FacetField("Author", new String[] { "Susan" }));
		doc.add(new FacetField("Publish Date",
				new String[] { "2012", "1", "7" }));
		indexWriter.addDocument(this.config.build(taxoWriter, doc));

		doc = new Document();
		doc.add(new FacetField("Author", new String[] { "Frank" }));
		doc.add(new FacetField("Publish Date",
				new String[] { "1999", "5", "5" }));
		indexWriter.addDocument(this.config.build(taxoWriter, doc));

		indexWriter.close();
		taxoWriter.close();
	}

	private List<FacetResult> search() throws IOException {
		DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
		IndexSearcher searcher = new IndexSearcher(indexReader);
		TaxonomyReader taxoReader = new DirectoryTaxonomyReader(this.taxoDir);

		FacetsCollector fc = new FacetsCollector();

		FacetsCollector.search(searcher, new MatchAllDocsQuery(), 10, fc);

		List<FacetResult> results = new ArrayList<FacetResult>();

		//定义author域的Facet,
		//FastTaxonomyFacetCounts第一个构造参数指定域名称,
		//则只统计指定域的Facet总数,不指定域名称,则默认会统计所有FacetField域的总数
		//这就跟SQL中的select * from...和 select name from ...差不多
		Facets author = new FastTaxonomyFacetCounts("author", taxoReader,
				this.config, fc);
		results.add(author.getTopChildren(10, "Author", new String[0]));

		//定义pubdate域的Facet
		Facets pubDate = new FastTaxonomyFacetCounts("pubdate", taxoReader,
				this.config, fc);
		results.add(pubDate.getTopChildren(10, "Publish Date", new String[0]));

		indexReader.close();
		taxoReader.close();

		return results;
	}

	public List<FacetResult> runSearch() throws IOException {
		index();
		return search();
	}

	public static void main(String[] args) throws Exception {
		System.out
				.println("Facet counting over multiple category lists example:");
		System.out.println("-----------------------");
		List<FacetResult> results = new MultiCategoryListsFacetsExample().runSearch();
		System.out.println("Author: " + results.get(0));
		System.out.println("Publish Date: " + results.get(1));
	}
}

    上面说的Facet数字统计方式都是按照出现次数计算总和,其实lucene还提供了js表达式方式自定义Facet数据统计,比如: 

/**当前索引文档评分乘以popularity域值的平方根 作为 Facet的统计值,默认是统计命中索引文档总数*/
Expression expr = JavascriptCompiler.compile("_score * sqrt(popularity)");

    即表示每个域的域值相同的落入一组,但此时每一组的数字总计不再是计算总个数,而是按照指定的js表达式去计算然后求和,比如上述表达式即表示当前索引文档评分乘以popularity域值的平方根 作为 Facet的统计值,_score是内置变量代指当前索引文档的评分,sqrt是内置函数表示开平方根,lucene expression表达式是通过JavascriptCompiler进行编译的,而JavascriptCompiler是借助大名鼎鼎的开源语法分析器antlr实现的,与antlr类似的还有JFlex,自己google了解去,关于语法分析这个话题就超出本篇主题了。下面说说lucene expression都支持哪些函数和操作符:

     大意就是它是基于Javascript语法的特定数字表达式,它支持:

     1,Integer,float数字,十六进制和八进制字符

     2.支持常用的数学操作符,加减乘除取模等

     3.支持位运算符

     4.支持布尔运算符

     5.支持比较运算符

     6.支持常用的数学函数,比如abs取绝对值,log对数,max取最大值,sqrt开平方根等

     7.支持三角函数,比如sin,cos等等

     8.支持地理函数,如haversin公式即求地球上任意两个点之间的距离,具体请自己去Google haversin公式

     9.辅助函数,貌似跟第6条重复,搞什么飞机,我只是翻译下而已,跟我没关系

     10.支持任意的外部变量,这一条有点费解,还好我在源码里找到了相关说明

 

     查阅org.apache.lucene.expressions.js包下面的package.html,看到里面有这样一句提示:

       打开Bindings类源码,没看到什么有用的信息,然后我无意在JavascriptCompiler类源码中找到如下一段说明,让我茅塞顿开:

/**
 * An expression compiler for javascript expressions.
 * <p>
 * Example:
 * <pre class="prettyprint">
 *   Expression foo = JavascriptCompiler.compile("((0.3*popularity)/10.0)+(0.7*score)");
 * </pre>
 * <p>
 * See the {@link org.apache.lucene.expressions.js package documentation} for 
 * the supported syntax and default functions.
 * <p>
 * You can compile with an alternate set of functions via {@link #compile(String, Map, ClassLoader)}.
 * For example:
 * <pre class="prettyprint">
 *   Map&lt;String,Method&gt; functions = new HashMap&lt;&gt;();
 *   // add all the default functions
 *   functions.putAll(JavascriptCompiler.DEFAULT_FUNCTIONS);
 *   // add cbrt()
 *   functions.put("cbrt", Math.class.getMethod("cbrt", double.class));
 *   // call compile with customized function map
 *   Expression foo = JavascriptCompiler.compile("cbrt(score)+ln(popularity)", 
 *                                               functions, 
 *                                               getClass().getClassLoader());
 * </pre>
 * 
 * @lucene.experimental
 */

    重点在这里:

Map<String,Method> functions = new HashMap<String,Method>();
 // add all the default functions
functions.putAll(JavascriptCompiler.DEFAULT_FUNCTIONS);
// add cbrt()
functions.put("cbrt", Math.class.getMethod("cbrt", double.class));
// call compile with customized function map
Expression foo = JavascriptCompiler.compile("cbrt(score)+ln(popularity)", 
                                             functions, 
                                             getClass().getClassLoader());

    通过示例代码,不难理解,意思就是我们可以把我们Java里类的某个方法通过JavascriptCompilter进行编译,就能像直接调用js内置函数一样,毕竟js内置函数是有限的,但内置的js函数不能满足你的要求,你就可以把java里自定义方法进行编译在js里执行。示例的代码就是把Math类里的cbrt方法(求一个double数值的立方根)编译为js内置函数,然后你就可以像使用js内置函数一样使用java里任意类的方法了。下面是一个使用lucene expressions完成Facet统计的完整示例:

package com.yida.framework.lucene5.facet;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumericDocValuesField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.expressions.Expression;
import org.apache.lucene.expressions.SimpleBindings;
import org.apache.lucene.expressions.js.JavascriptCompiler;
import org.apache.lucene.facet.FacetField;
import org.apache.lucene.facet.FacetResult;
import org.apache.lucene.facet.Facets;
import org.apache.lucene.facet.FacetsCollector;
import org.apache.lucene.facet.FacetsConfig;
import org.apache.lucene.facet.taxonomy.TaxonomyFacetSumValueSource;
import org.apache.lucene.facet.taxonomy.TaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.SortField;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class ExpressionAggregationFacetsExample {
	private final Directory indexDir = new RAMDirectory();
	private final Directory taxoDir = new RAMDirectory();
	private final FacetsConfig config = new FacetsConfig();

	private void index() throws IOException {
		IndexWriter indexWriter = new IndexWriter(this.indexDir,
				new IndexWriterConfig(new WhitespaceAnalyzer())
						.setOpenMode(IndexWriterConfig.OpenMode.CREATE));

		DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(
				this.taxoDir);

		Document doc = new Document();
		doc.add(new TextField("c", "foo bar", Field.Store.NO));
		doc.add(new NumericDocValuesField("popularity", 5L));
		doc.add(new FacetField("A", new String[] { "B" }));
		indexWriter.addDocument(this.config.build(taxoWriter, doc));

		doc = new Document();
		doc.add(new TextField("c", "foo foo bar", Field.Store.NO));
		doc.add(new NumericDocValuesField("popularity", 3L));
		doc.add(new FacetField("A", new String[] { "C" }));
		indexWriter.addDocument(this.config.build(taxoWriter, doc));
		
		doc = new Document();
		doc.add(new TextField("c", "foo foo bar", Field.Store.NO));
		doc.add(new NumericDocValuesField("popularity", 8L));
		doc.add(new FacetField("A", new String[] { "B" }));
		indexWriter.addDocument(this.config.build(taxoWriter, doc));

		indexWriter.close();
		taxoWriter.close();
	}

	private FacetResult search() throws IOException, ParseException {
		DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
		IndexSearcher searcher = new IndexSearcher(indexReader);
		TaxonomyReader taxoReader = new DirectoryTaxonomyReader(this.taxoDir);

		/**当前索引文档评分乘以popularity域值的平方根 作为 Facet的统计值,默认是统计命中索引文档总数*/
		Expression expr = JavascriptCompiler.compile("_score * sqrt(popularity)");
		SimpleBindings bindings = new SimpleBindings();
		bindings.add(new SortField("_score", SortField.Type.SCORE));
		bindings.add(new SortField("popularity", SortField.Type.LONG));

		FacetsCollector fc = new FacetsCollector(true);

		FacetsCollector.search(searcher, new MatchAllDocsQuery(), 10, fc);

		//以Expression表达式的计算值定义一个Facet
		Facets facets = new TaxonomyFacetSumValueSource(taxoReader,
				this.config, fc, expr.getValueSource(bindings));
		FacetResult result = facets.getTopChildren(10, "A", new String[0]);

		indexReader.close();
		taxoReader.close();

		return result;
	}

	public FacetResult runSearch() throws IOException, ParseException {
		index();
		return search();
	}

	public static void main(String[] args) throws Exception {
		System.out.println("Facet counting example:");
		System.out.println("-----------------------");
		FacetResult result = new ExpressionAggregationFacetsExample()
				.runSearch();
		System.out.println(result);
	}
}

    前面说到的Lucene Expression是通过一个表达式来计算统计值然后求和,我们也可以使用AssociationFacetField域来关联一个任意值,然后根据这个值来求和统计,类似这样:

//3 --> lucene[不再是统计lucene这个域值的出现总次数,而是统计IntAssociationFacetField的第一个构造参数assoc总和]
doc.add(new IntAssociationFacetField(3, "tags", new String[] { "lucene" }));
doc.add(new FloatAssociationFacetField(0.87F, "genre", new String[] { "computing" }));

   其实创建一个AssociationFacetField等价于创建了一个StringField和一个BinaryDocValuesField(创建这个域是为了辅助统计Facet Count) ,有关AssociationFacetField的完整示例代码如下:

package com.yida.framework.lucene5.facet;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.facet.DrillDownQuery;
import org.apache.lucene.facet.FacetResult;
import org.apache.lucene.facet.Facets;
import org.apache.lucene.facet.FacetsCollector;
import org.apache.lucene.facet.FacetsConfig;
import org.apache.lucene.facet.taxonomy.FloatAssociationFacetField;
import org.apache.lucene.facet.taxonomy.IntAssociationFacetField;
import org.apache.lucene.facet.taxonomy.TaxonomyFacetSumFloatAssociations;
import org.apache.lucene.facet.taxonomy.TaxonomyFacetSumIntAssociations;
import org.apache.lucene.facet.taxonomy.TaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class AssociationsFacetsExample {
	private final Directory indexDir = new RAMDirectory();
	  private final Directory taxoDir = new RAMDirectory();
	  private final FacetsConfig config;

	  public AssociationsFacetsExample()
	  {
	    this.config = new FacetsConfig();
	    this.config.setMultiValued("tags", true);
	    this.config.setIndexFieldName("tags", "$tags");
	    this.config.setMultiValued("genre", true);
	    this.config.setIndexFieldName("genre", "$genre");
	  }

	  private void index() throws IOException
	  {
	    IndexWriterConfig iwc = new IndexWriterConfig(new WhitespaceAnalyzer()).setOpenMode(IndexWriterConfig.OpenMode.CREATE);
	    IndexWriter indexWriter = new IndexWriter(this.indexDir, iwc);

	    DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(this.taxoDir);

	    Document doc = new Document();

	    //3 --> lucene[不再是统计lucene这个域值的出现总次数,而是统计IntAssociationFacetField的第一个构造参数assoc总和]
	    doc.add(new IntAssociationFacetField(3, "tags", new String[] { "lucene" }));

	    doc.add(new FloatAssociationFacetField(0.87F, "genre", new String[] { "computing" }));
	    indexWriter.addDocument(this.config.build(taxoWriter, doc));

	    doc = new Document();

	    //1 --> lucene
	    doc.add(new IntAssociationFacetField(1, "tags", new String[] { "lucene" }));

	    doc.add(new IntAssociationFacetField(2, "tags", new String[] { "solr" }));

	    doc.add(new FloatAssociationFacetField(0.75F, "genre", new String[] { "computing" }));

	    doc.add(new FloatAssociationFacetField(0.34F, "genre", new String[] { "software" }));
	    indexWriter.addDocument(this.config.build(taxoWriter, doc));

	    indexWriter.close();
	    taxoWriter.close();
	  }

	  private List<FacetResult> sumAssociations() throws IOException
	  {
	    DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
	    IndexSearcher searcher = new IndexSearcher(indexReader);
	    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(this.taxoDir);

	    FacetsCollector fc = new FacetsCollector();

	    FacetsCollector.search(searcher, new MatchAllDocsQuery(), 10, fc);

	    //定义了两个Facet
	    Facets tags = new TaxonomyFacetSumIntAssociations("$tags", taxoReader, this.config, fc);
	    Facets genre = new TaxonomyFacetSumFloatAssociations("$genre", taxoReader, this.config, fc);

	    List<FacetResult> results = new ArrayList<FacetResult>();
	    results.add(tags.getTopChildren(10, "tags", new String[0]));
	    results.add(genre.getTopChildren(10, "genre", new String[0]));

	    indexReader.close();
	    taxoReader.close();

	    return results;
	  }

	  private FacetResult drillDown() throws IOException
	  {
	    DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
	    IndexSearcher searcher = new IndexSearcher(indexReader);
	    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(this.taxoDir);

	    DrillDownQuery q = new DrillDownQuery(this.config);

	    q.add("tags", new String[] { "solr" });
	    FacetsCollector fc = new FacetsCollector();
	    FacetsCollector.search(searcher, q, 10, fc);

	    Facets facets = new TaxonomyFacetSumFloatAssociations("$genre", taxoReader, this.config, fc);
	    FacetResult result = facets.getTopChildren(10, "$genre", new String[0]);

	    indexReader.close();
	    taxoReader.close();

	    return result;
	  }

	  public List<FacetResult> runSumAssociations() throws IOException
	  {
	    index();
	    return sumAssociations();
	  }

	  public FacetResult runDrillDown() throws IOException
	  {
	    index();
	    return drillDown();
	  }

	  public static void main(String[] args) throws Exception
	  {
	    System.out.println("Sum associations example:");
	    System.out.println("-------------------------");
	    List<FacetResult> results = new AssociationsFacetsExample().runSumAssociations();
	    System.out.println("tags: " + results.get(0));
	    System.out.println("genre: " + results.get(1));
	  }
}

    

      Facet就说这么多了,一些细节方面大家请看demo代码里面的注释,如果我哪里没说到的而你又感觉很迷惑无法理解的,请提出来,大家一起交流学习。如果我有哪里说的不够准确,也希望大家能够积极指正多多批评。 Demo源码请看底下的附件!!!

    如果你还有什么问题请加我Q-Q:7-3-6-0-3-1-3-0-5,

或者加裙
一起交流学习!

       

    

  • 大小: 166.1 KB
  • 大小: 726.3 KB
1
0
分享到:
评论

相关推荐

    lucene分组查询优化facet

    其中,Facet(分面)查询是Lucene提供的一种强大的分类和统计功能,它允许用户根据特定的维度(如作者、类别等)对搜索结果进行分组和计数,从而帮助用户更深入地探索数据。本篇文章将详细探讨Lucene的分组查询优化...

    lucene facet查询示例

    **Lucene Facet查询详解** Lucene是一款强大的全文搜索引擎库,广泛应用于各种信息检索系统。在处理大量数据时,为了帮助用户快速、有效地探索和理解数据,Lucene引入了Facets(方面)功能,它提供了分类浏览和统计...

    lucene facet bobo-browse实现

    《Lucene Facet 使用Bobo-Browse实现详解》 在信息检索领域,Lucene作为一款强大的全文搜索引擎,被广泛应用于各种复杂的数据检索场景。然而,随着数据量的日益增大,如何有效地对搜索结果进行分类和统计,即所谓的...

    lucene-facet-4.6.0.jar

    Lucene4.6版本,适用于Lucene的所有研究,以及中文分词功能

    rh-java-common-lucene5-facet-5.4.1-2.4.el7.noarch.rpm

    官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装

    rh-java-common-lucene5-facet-5.4.1-2.3.el7.noarch.rpm

    官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装

    rh-java-common-lucene5-facet-5.4.1-2.2.el7.noarch.rpm

    官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装

    lucene-facet-3.6.2.jar

    java运行依赖jar包

    lucene 资料全集

    所提供的文档资源,如《Lucene学习总结之一》、《传智播客Lucene3.0课程》、《JAVA_Lucene_in_Action教程完整版》以及《Lucene_in_Action(中文版)》,都是深入了解 Lucene 的宝贵资料,建议结合这些材料进行系统...

    自己写的一个lucene知识点集合

    **Lucene 知识点详解** Lucene 是一个开源全文搜索引擎库,由 Apache 软件基金会开发。它提供了一个可扩展的、高性能...这些代码示例是学习 Lucene 知识的宝贵资源,可以帮助我们更好地理解和掌握 Lucene 的核心概念。

    lucene4.10.4源码包

    《深入剖析Lucene 4.10.4源码:全文检索技术的基石》...通过深入学习和分析Lucene 4.10.4的源码,开发者不仅可以掌握全文检索的基本原理,还能理解其在实际应用中的优化策略,为构建高效、精准的搜索系统奠定坚实基础。

    Lucene2.4入门总结

    **Lucene 2.4 入门指南** Lucene 是一个高性能、全文本搜索库,由 Apache 软件...随着对 Lucene 更深入的学习,你将能够探索更多的高级特性,如近实时搜索、分布式索引和更复杂的查询语法,以满足更复杂的应用场景。

    基于lucene组件的全文搜索系统

    **基于Lucene组件的全文搜索系统** 全文搜索引擎是现代互联网技术中的重要组成部分,它使得用户可以快速、准确地在海量数据中...通过学习和掌握Lucene,开发者可以构建出强大、灵活的搜索解决方案,满足各种业务需求。

    rh-java-common-lucene-facet-4.8.0-6.9.el7.noarch.rpm

    官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装

    rh-java-common-lucene-facet-4.8.0-6.8.el7.noarch.rpm

    官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装

    rh-java-common-lucene-facet-4.8.0-6.7.el7.noarch.rpm

    官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装

    Lucene.zip

    《Lucene深度解析:从入门到精通》 Lucene,作为Apache软件基金会的顶级项目,是Java语言...通过“Lucene.zip”的学习,读者可以全面掌握Lucene的核心技术和实践技巧,为构建高效、精准的信息检索系统打下坚实基础。

    java进阶Solr从基础到实战

    Solr它是一种开放源码的、基于 Lucene 的搜索服务器,可以高效的完成全文检索的功能。在本套课程中,我们将全面的讲解Solr,从Solr基础到Solr高级,再到项目实战,基本上涵盖了Solr中所有的知识点。 主讲内容 章节一...

    solr基础知识介绍

    Apache Lucene是一个基于Java的全文搜索引擎库,它提供了一个强大的工具集用于索引和搜索文本。Lucene的核心功能包括为文件中的每个单词建立索引,这种索引方式极大地提高了搜索效率,因为它不同于传统逐字比较的...

Global site tag (gtag.js) - Google Analytics