solr中对于关键字置顶（竞价排名）、拉黑的源码实现已经实例讲解（二） -

suichangkele

浏览: 203793 次
性别:
来自: 北京

最近访客更多访客>>

jieyuan_cg

z9780420

jzhfmm

geeksun

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

solr中对于关键字置顶（竞价排名）、拉黑的源码实现已经实例讲解（二）

博客分类：

solr

继续看他的源码，在上一篇博客中还有几个方法没有看，第一个是getElevationMap，如果在请求中没有指定elevateIds或者没有指定excludeIds的话，则调用这个方法

 /** get the elevation map from the data dir <br/> 从data中读取配置文件。 */
  Map<String,ElevationObj> getElevationMap(IndexReader reader, SolrCore core) throws Exception {
    
    synchronized (elevationCache) {
      
      // 如果在配置文件中设置了（不是solrCloud的时候），则不再读取，因为此时添加的话，key就是null。
      Map<String,ElevationObj> map = elevationCache.get(null);
      if (map != null) return map;
      
      map = elevationCache.get(reader);//根据indexReader读取，如果reader发生了变化，则重新读取，否则不读取，读取的话就会重新加载elevator.xml。
      if (map == null) {
        String f = initArgs.get(CONFIG_FILE);
        if (f == null) {
          throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
              "QueryElevationComponent must specify argument: " + CONFIG_FILE);
        }
        log.info("Loading QueryElevation from data dir: " + f);
        
        Config cfg;
        
        // 读取配置文件可以从zk上读取（solrCloud），也可以从本地读取（solr）
        ZkController zkController = core.getCoreDescriptor().getCoreContainer().getZkController();
        if (zkController != null) {
          cfg = new Config(core.getResourceLoader(), f, null, null);
        } else {
          InputStream is = VersionedFile.getLatestFile(core.getDataDir(), f);//从data中读取
          cfg = new Config(core.getResourceLoader(), f, new InputSource(is), null);
        }
        
        map = loadElevationMap(cfg);
        elevationCache.put(reader, map);
      }
      return map;
    }
  }

从这个方法中可以得出，如果你的要置顶的document是变化的话，如果你使用的是单机版的solr（不是solrCloud）就不要设置在conf中，不然你必须重启才可以重新加载这个配置文件。如果elevator.xml在solrCloud或者是data下的话，只要indexReader一发生变化，就会重新加载，也就是一commit就会重新读取。

我们再看一下ElevatorObj的代码，这个类用于封装一个指定，也即是封装一个文本值，和要置顶以及拉黑的document id。

  class ElevationObj {
    
    /** 穿入的text，也就是搜索的词 */
    final String text;
    /** 对上面的text进行分词之后的结果，可能和text一样，也可能不一样 */
    final String analyzed;
    /**排除的id组成的termQuery*/
    final TermQuery[] exclude;
    /**要置顶的那些document封装的query*/
    final BooleanQuery include;
     /**每一个id的权重，由大变小*/
    final Map<BytesRef,Integer> priority;
    /**这个是包含的id*/
    final Set<String> ids;
    /**这个是排除的id*/
    final Set<String> excludeIds;
    
    // 第一个参数是文本值，第二个参数是包含的多个id，第三个是排除的多个id
    ElevationObj(String qstr, List<String> elevate, List<String> exclude) throws IOException {
      
      this.text = qstr;
      this.analyzed = getAnalyzedQuery(this.text);//将文本分词
      this.ids = new HashSet<>();
      this.excludeIds = new HashSet<>();
      
      this.include = new BooleanQuery();
      this.include.setBoost(0);
      this.priority = new HashMap<>();
      int max = elevate.size() + 5;
      
      //  对于要置顶的doc，采用的是封装进一个booleanQuery.
      for (String id : elevate) {
        id = idSchemaFT.readableToIndexed(id);//没有操作
        ids.add(id);
        TermQuery tq = new TermQuery(new Term(idField, id));
        include.add(tq, BooleanClause.Occur.SHOULD);
        this.priority.put(new BytesRef(id), max--);
      }
      
      if (exclude == null || exclude.isEmpty()) {
        this.exclude = null;
      } else {
        this.exclude = new TermQuery[exclude.size()];
        for (int i = 0; i < exclude.size(); i++) {
          String id = idSchemaFT.readableToIndexed(exclude.get(i));
          excludeIds.add(id);
          this.exclude[i] = new TermQuery(new Term(idField, id));//封装要拉黑的doc到一个数组中
        }
      } 
    }
  }

看完上面的代码可以总结，他是将原来我们在请求中设置的query 又封装了多个query，有要置顶的，有要拉黑的，都是用id封装的。

最后一个办法最关键了，用来排序，将指定的要置顶的document进行排序：ElevationComparatorSource类，它用于产生一个排序器，我们只看newComparator方法

    /*** 返回的比较器根据的就是设置的priority进行排序的。 */
    @Override
    public FieldComparator<Integer> newComparator(String fieldname, final int numHits, int sortPos, boolean reversed)
        throws IOException {
      
      return new SimpleFieldComparator<Integer>() {
        
        /**这个最终存放的是priority的值，根据*/
        private final int[] values = new int[numHits];
        private int bottomVal;
        private int topVal;
        private PostingsEnum postingsEnum;
        //最后搜集到的id（置顶的）
        private Set<String> seen = new HashSet<>(elevations.ids.size());
        //最后的排序的实现，根据value中的值，
        public int compare(int slot1, int slot2) {
          return values[slot1] - values[slot2]; // values will be small enough that there is no overflow concern
        }
        
        @Override
        public void setBottom(int slot) {
          bottomVal = values[slot];
        }
        
        @Override
        public void setTopValue(Integer value) {
          topVal = value.intValue();
        }
        
        /**
         * 读取docValue  根据lucne的id找到指定的id，再根据指定的id找到priority。最后读取的docValue就是priority
         * @param doc   lucene的id
         * @return      docValue的值
         */
        private int docVal(int doc) {
          if (ordSet.size() > 0) {
            int slot = ordSet.find(doc);
            if (slot >= 0) {//大于0表示在ordSet中，也就是这个id被指定了置顶。
              BytesRef id = termValues[slot];//指定的id
              Integer prio = elevations.priority.get(id);//根据指定的id读取docValue，也就是priority。
              return prio == null ? 0 : prio.intValue();
            }
          }
          return 0;//如果没有指定置顶，则所有的值都是0，表示排序都是一样的，再根据得分的排序器排序。
        }
        
        @Override 
        public int compareBottom(int doc) {//当排序时，先要对比bottomVal
          return bottomVal - docVal(doc);
        }
        //给value赋值，实现lucene的id和docuemnt的id的交换
        @Override
        public void copy(int slot, int doc) {
          values[slot] = docVal(doc);//docVal就是读取的指定的solr的id（和fieldCache是一样的）
        }
        /** 当切换segmentReader的时候调用，读取真正存在的id，添加到seen、ordSet和termValues中。*/
        protected void doSetNextReader(LeafReaderContext context) throws IOException {
          // convert the ids to Lucene doc ids, the ordSet and termValues needs to be the same size as the number of
          // elevation docs we have
          ordSet.clear();
          Fields fields = context.reader().fields();
          if (fields == null) return;
          Terms terms = fields.terms(idField);//和fieldCache一样，也是读取的词典表
          if (terms == null) return;
          TermsEnum termsEnum = terms.iterator();
          BytesRefBuilder term = new BytesRefBuilder();
          Bits liveDocs = context.reader().getLiveDocs();//没有被删除的id
          for (String id : elevations.ids) {
            term.copyChars(id);
            if (seen.contains(id) == false && termsEnum.seekExact(term.get())) {
              postingsEnum = termsEnum.postings(liveDocs, postingsEnum, PostingsEnum.NONE);
              int docId = postingsEnum.nextDoc();//因为是id，所以不会是重复的
              if (docId == DocIdSetIterator.NO_MORE_DOCS) continue; // must have been deleted
              termValues[ordSet.put(docId)] = term.toBytesRef();//添加lucene的id和指定的id的关系,将lucene的id放到ordSet中，返回的是在ordSet中的位置，然后将对应的指定的doc放在termVlues中，实现lucene的id和指定的id的关联。
              seen.add(id);
              assert postingsEnum.nextDoc() == DocIdSetIterator.NO_MORE_DOCS;//因为是id，所以一定只能搜到一个doc，所以是no_more_docs
            }
          }
        }
      };
    }
  }

这个比较器就是实现用priority，也就是按照我们指定的置顶的顺序进行排序。

看完上面这些，就可以使用置顶功能了，无论是solrCloud还是单机版的solr，如果是需要配置文件的话，如果这个文件是需要修改的，那么就会很麻烦，所以我还是推荐使用将置顶的和拉黑的id放在请求的参数中，我自己做的实验也是这么做的。我做的实验是使用了一个单机版的solr，添加了两个document，每个document有两个域，一个是id，一个是title，id为1的的title含有两个hello，id为2的含有4个hello，我的select的requestHandler的df是title。

在浏览器中输入 http://localhost:8080/solr/collection1/select?q=hello&wt=json&indent=true，很明显，id为2的应该排在前面，因为他含有4个hello。

下面使用elevator，url变为：http://localhost:8080/solr/collection1/select?q=hello&wt=json&indent=true&enableElevation=on&elevateIds=1，添加了enableElevation=on，也就是开启置顶，elevateIds=1，置顶的id为1，然后现在id=1就排在前面了，并且id=2的也显示。

使用拉黑：http://localhost:8080/solr/collection1/select?q=hello&wt=json&indent=true&enableElevation=on&elevateIds=1&excludeIds=2，添加了&excludeIds=2，也就是将2拉黑，此时，只有id=1的显示。

添加排序：http://localhost:8080/solr/collection1/select?q=hello&wt=json&indent=true&sort=id asc&enableElevation=on&forceElevation=on&elevateIds=2，虽然开启了sort，按照id升序，也就是id=1的在前面，但是后面设置了forceElevation，也就是强迫使用置顶的排序并且置顶2，此时还是id=2的排在前面。

over，算是弄清楚solr的置顶和拉黑了。

分享到：

httpClient中的timeout的配置 | solr中对于关键字置顶（竞价排名）、拉黑的 ...

2017-03-10 13:42
浏览 696
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

solr中对于关键字置顶（竞价排名）、拉黑的源码实现已经实例讲解（二）

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

solr中对于关键字置顶（竞价排名）、拉黑的源码实现已经实例讲解（二）

评论

发表评论

相关推荐

solr的facet源码解读（四）——facet.field之非数字单值域类型

solr的facet源码解读（三）——facet.field之数字单值域类型

solr的facet源码解读（二）——facet.field

lucene中关于正向信息的获取——FielldCache

solr的facet源码解读（一）——facet.query

solr(lucene)的reRank的核心实现源码解读

solr中的filterCache使用场景源码解读

solr(lucene)中的value source

关于functionQuery的一个误区

solr的主从复制实现原理

solr VS es

solr中的reload

solr中schema.xml中域的omitNorm属性

solr中的dismax解析器

solr中的同义词配置以及关键源码解读

如何查看solr中cache的使用情况

solr中与SolrIndexSearcher相关的其他配置

solr中的SolrEventListener以及cache统计信息的获得

solr的warm

solr的cache在SolrIndexSearcher中的使用

最近访客更多访客>>