solr中的filterCache使用场景源码解读 -

suichangkele

浏览: 204538 次
性别:
来自: 北京

最近访客更多访客>>

jieyuan_cg

z9780420

jzhfmm

geeksun

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

solr中的filterCache使用场景源码解读

博客分类：

solr

solr filterCache

都知道solr有四个缓存，queryResultCache，documentCache，filterCache，fieldValueCache，今天我要好好说一下filterCache，据说他是用来缓存fq的docid的，也就是当搜索到一个fq对应的query的所有的docid之后，对这个结果进行缓存，方便以后的重复使用，这样就能省去更多的io操作。为了得到一个更准确的结论，我就又仔细的读了一遍代码，用公司的4.10.4的版本的solr做了个solr节点，做了很多实验，算是掌握了filterCache的使用场景了吧。

什么情况下会用到filterCache。在SolrIndexSearcher的getDocListC中，如果命中了遇到了缓存（这里的缓存指得是queryResultCache），我们看下代码：

if (queryResultCache != null && cmd.getFilter() == null	&& (flags & (NO_CHECK_QCACHE | NO_SET_QCACHE)) != ((NO_CHECK_QCACHE | NO_SET_QCACHE))) //如果可以查询缓存
	key = new QueryResultKey(q, cmd.getFilterList(), cmd.getSort(), flags);//构建从queryResultCache中查询的key
	if ((flags & NO_CHECK_QCACHE) == 0) { //再判断一遍如果可以查询缓存
		superset = queryResultCache.get(key);//从queryResultCache中查询
		if (superset != null) {//缓存命中，
			if ((flags & GET_SCORES) == 0 || superset.hasScores()) {//如果此次查询不需要返回得分，或者缓存的结果中有得分，则进入if
				out.docList = superset.subset(cmd.getOffset(), cmd.getLen());//从缓存的结果中取得本次需要的结果集合，判断的根据是start + rows这两个参数，经过这个步骤，可能会因为符合start + rows而又结果，也可能因为不符合而没有结果
			}
		}
		if (out.docList != null) {//如果有结果，
			if (out.docSet == null && ((flags & GET_DOCSET) != 0)) {//关键就是这个flags ，我们需要知道这个flats和GET_DOCSET的关系，经过我的代码查找，当使用到了facet的时候，flags & GET_DOCSET) != 0成立，也就是当facet的时候回需要返回docSet
				if (cmd.getFilterList() == null) {//如果此次查询没有fq
					out.docSet = getDocSet(cmd.getQuery()); // 这个方法就是获得q解析的query的docSet，先从filterCache中查找，如果没有命中就会再从lucene中查找，然后放入filterCache中。从这里看filteCache也是会放入q的docset的。
				} else {//如果有fq
					List<Query> newList = new ArrayList<>(cmd.getFilterList().size() + 1);
					newList.add(cmd.getQuery());
					newList.addAll(cmd.getFilterList());
					out.docSet = getDocSet(newList);// 这个方法也会从filterCache中获取docSet，是在getPositiveDocSet方法里面调用的，然后再在这个方法里面做交集或者差集 ，经过这个方法后，q和fq对应的所有的query的的docSet都会进入到filterCache中
				}
			}
			return;
		}
	}

（我先说一下我是怎么找到facet会设置flags & GET_DOCSET != 0的，在org.apache.solr.search.SolrIndexSearcher.QueryCommand.setNeedDocSet(boolean)方法里面就会将flags设置为flags & GET_DOCSET != 0，而这个方法的调用时在org.apache.solr.handler.component.ResponseBuilder.getQueryCommand()里面，而使用的参数就是org.apache.solr.handler.component.ResponseBuilder.isNeedDocSet()，我们看一下org.apache.solr.handler.component.ResponseBuilder.setNeedDocSet(boolean)这个方法，他的调用实在org.apache.solr.handler.component.FacetComponent.prepare(ResponseBuilder)里面，而且传入的就是true，也就是在打开facet的时候，就会是flags & GET_DOCSET != 0）

上面的代码说明，如果命中了缓存且开启了facet，那么就会调用getDocSet方法，参数或者是一个query，另一个是List<query>，来获得所有的docid以实现facet的功能。在只有一个参数的getDocSet方法里面就会从filterCache中查找docset，如果没有查找就会调用getDocSetNC（NC表示not cache）从lucne的索引中查找，然后放入到filterCache中去，此时q的query的docSet就会被放入fitlerCache了；而在参数是List<query>的方法中也会从filterCache中查找，只不过他是将query单独查找的filterCache（具体的实现方法是getProcessedFilter，这个方法会通过调用getPositiveDocSet从filterCache中获取docSet，然后再在这个方法里面做交集或者差集，这个方法的第二个参数的所有的query的倒排表都会放入到filteCache中去），此时所有的fq的docSet以及q的docSet都被放到了filterCache中去。这就说明了在命中了缓存（再次强调这里的缓存是queryResultCache）的情况下，如果开启了facet，就会从filterCache中查找docSet，并且所有的fq以及q形成的query的docSet都会放入到filterCache中去（从这一点可以发现叫做filterCache不太合适啊，因为q的docSet也会放进去）。

如果没有命中缓存呢，代码是solrIndexSearcher的getDocListC的一部分，如下：

if (useFilterCache) {//先不用管这个，后面会有单独的说明
	// now actually use the filter cache.
	// for large filters that match few documents, this may be
	// slower than simply re-executing the query.
	if (out.docSet == null) {
		out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter());
		DocSet bigFilt = getDocSet(cmd.getFilterList());
		if (bigFilt != null)
			out.docSet = out.docSet.intersection(bigFilt);
	}
	// todo: there could be a sortDocSet that could take a list of
	// the filters instead of anding them first...
	// perhaps there should be a multi-docset-iterator
	sortDocSet(qr, cmd);
} else {
	// do it the normal way... 也就是从lucene中查找。
	if ((flags & GET_DOCSET) != 0) {//还是先判断是不是GET_DOCSET，从上面我们知道，如果是facet的话，就是true，否则是false.
		// this currently conflates returning the docset for the base query vs the base query and all filters.
		DocSet qDocSet = getDocListAndSetNC(qr, cmd);//这个方法中，同样会调getProcessedFilter方法，第二个参数是所有的fq的queyr，即fq的所有的docSet都放入了fitlerCache。
		if (qDocSet != null && filterCache != null && !qr.isPartialResults())//当没有filter的时候，也会把query对应的docSet放入filteCache。因此此时获得的docSet和query是匹配的。
			filterCache.put(cmd.getQuery(), qDocSet);
	} else {
		getDocListNC(qr, cmd);//在不进行facet的情况下，对于fq，也会用到上面的getProcessedFilter方法，也就是也会向filterCache中查找，如果没有命中就从lucene中查找，然后将结果放入filterCache。
	}
}

上面的两个方法，getDocListAndSetNC和getDocListNC里面都会调用getProcessedFilter方法，传入的参数是fq所代表的query，获得的结果就是所有的fq的交集，也就是对于fq，即使是在facet不打开的时候，进行fq的倒排表的合并也是会使用filterCache的。这就说明了在没有命中QueryResultCache的情况下，不论是不是打开facet也会使用filterCache的，使用它进行fq的倒排表的合并，不过在使用facet的时候对于docSet的获得仍然是通过先查询的lucene（因为没有命中缓存嘛）。

经过上面的代码，无论是命中缓存还是不命中缓存的时候，我们可以总结一个结论，filterCache的作用有两个，一个是进行倒排表的合并，是实现了多个fq的交集，第二个就是从filterCache中获得docset，实现facet的功能。或者更抽象一下，filterCache就是存贮query的docSet的，query不一定非得是fq，q的倒排表也会放入的。

其实filterCache还有一个功能，也就是上面代码中的if(useFilterCache)的部分，他的逻辑很简单，我们看一下代码

boolean useFilterCache = false;
if ((flags & (GET_SCORES | NO_CHECK_FILTERCACHE)) == 0 && useFilterForSortedQuery && cmd.getSort() != null && filterCache != null) {//如果这次请求是不用返回得分的，且在solrconfig中配置了useFilterForSortedQuery=true且这次请求有排序且filterCache不是null
	useFilterCache = true;
	SortField[] sfields = cmd.getSort().getSort();
	for (SortField sf : sfields) {
		if (sf.getType() == SortField.Type.SCORE) {//如果所有的排序中没有使用score的
			useFilterCache = false;
			break;
		}
	}
}
if (useFilterCache) {//下面的代码就是使用filerCache实现请求的结果
	if (out.docSet == null) {
		out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter());//这个是从lucene的索引中查找query + cmd.getFilter的倒排表的docSet（注意这里的cmd.fitler不是fq，fq是cmd.getFilterList）
		DocSet bigFilt = getDocSet(cmd.getFilterList());//从filterCache中查找，如果没有找得到则从lucene中查找再放入
		if (bigFilt != null)
			out.docSet = out.docSet.intersection(bigFilt);//两者取交集
	}
	sortDocSet(qr, cmd);//对结果结合进行排序
} else {xxxxx}//同上，省略

为什么上面要单独强调不能使用得分呢？原因很简单，因为如果使用得分排序的话，就可能需要tf，可能需要位置信息，可能需要payload，但是filterCache中是没有这些的，他仅仅含有id，所以如果使用score的话，就不能使用fitlerCache了。而如果不适用score排序的话，也就是使用某个域或者某个函数排序，这样就可以根据id从FieldCache中去查找了，此时filterCache提供的id就可以满足需求。

所以，从上所述，filterCache除了上面的功能外，还有一个功能就是满足不带有得分的排序时的请求的功能，不过这个功能用到的可能性很小。

分享到：

solr(lucene)的reRank的核心实现源码解读 | solr(lucene)中的value source

2018-01-21 17:01
浏览 1125
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

solr中的filterCache使用场景源码解读

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

solr中的filterCache使用场景源码解读

评论

发表评论

相关推荐

solr的facet源码解读（四）——facet.field之非数字单值域类型

solr的facet源码解读（三）——facet.field之数字单值域类型

solr的facet源码解读（二）——facet.field

lucene中关于正向信息的获取——FielldCache

solr的facet源码解读（一）——facet.query

solr(lucene)的reRank的核心实现源码解读

solr(lucene)中的value source

关于functionQuery的一个误区

solr的主从复制实现原理

solr VS es

solr中的reload

solr中schema.xml中域的omitNorm属性

solr中的dismax解析器

solr中的同义词配置以及关键源码解读

如何查看solr中cache的使用情况

solr中与SolrIndexSearcher相关的其他配置

solr中的SolrEventListener以及cache统计信息的获得

solr的warm

solr的cache在SolrIndexSearcher中的使用

solr中的cache的实现原理

最近访客更多访客>>