lucene2.4源码学习8 得分计算方法 Weight的变量部分

huangyunbin

浏览: 2630169 次
性别:
来自: 广州

最近访客更多访客>>

cht的大摩托

xiaoxiaoHer

zzqfsy

为了ta

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

lucene 2.4源码学习

public interface Weight extends java.io.Serializable {
  /** The query that this concerns. */
  Query getQuery();

  /** The weight for this query. */
  float getValue();

  /** The sum of squared weights of contained query clauses. */
  float sumOfSquaredWeights() throws IOException;

  /** Assigns the query normalization factor to this. */
  void normalize(float norm);

  /** Constructs a scorer for this. */
  Scorer scorer(IndexReader reader) throws IOException;

  /** An explanation of the score computation for the named document. */
  Explanation explain(IndexReader reader, int doc) throws IOException;
}

这是weight接口的所有方法。
我们主要使用的是scorer方法，计算最后的得分。

  public Scorer scorer(IndexReader reader) throws IOException {
      BooleanScorer2 result = new BooleanScorer2(similarity,
                                                 minNrShouldMatch,
                                                 allowDocsOutOfOrder);

      for (int i = 0 ; i < weights.size(); i++) {
        BooleanClause c = (BooleanClause)clauses.get(i);
        Weight w = (Weight)weights.get(i);
        Scorer subScorer = w.scorer(reader);
        if (subScorer != null)
          result.add(subScorer, c.isRequired(), c.isProhibited());
        else if (c.isRequired())
          return null;
      }

      return result;
    }

这里需要weights，看看weights是怎么来的：

 protected ArrayList weights = new ArrayList();

    public BooleanWeight(Searcher searcher)
      throws IOException {
      this.similarity = getSimilarity(searcher);
      for (int i = 0 ; i < clauses.size(); i++) {
        BooleanClause c = (BooleanClause)clauses.get(i);
        weights.add(c.getQuery().createWeight(searcher));
      }
    }

其实这个就是依赖query的createWeight方法。

看TermQuery的
public TermWeight(Searcher searcher)
      throws IOException {
      this.similarity = getSimilarity(searcher);
      idf = similarity.idf(term, searcher); // compute idf
    }

可以看到idf 是在这一步算出来的。

  TermScorer(Weight weight, TermDocs td, Similarity similarity,
             byte[] norms) {
    super(similarity);
    this.weight = weight;
    this.termDocs = td;
    this.norms = norms;
    this.weightValue = weight.getValue();

    for (int i = 0; i < SCORE_CACHE_SIZE; i++)
      scoreCache[i] = getSimilarity().tf(i) * weightValue;
  }

得分的计算我们需要weight的value。weight.getValue()
value值是怎么来的呢:

   public void normalize(float queryNorm) {
      this.queryNorm = queryNorm;
      queryWeight *= queryNorm;                   // normalize query weight
      value = queryWeight * idf;                  // idf for document
    }

可以看到value是由queryWeight和queryNorm确定的。

  public float sumOfSquaredWeights() {
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;           // square it
    }

queryWeight 就是idf和boosst的乘积。

float norm = getSimilarity(searcher).queryNorm(sum);
    weight.normalize(norm);

queryNorm的值由queryNorm方法确定。

 public float queryNorm(float sumOfSquaredWeights) {
    return (float)(1.0 / Math.sqrt(sumOfSquaredWeights));
  }

  public float sumOfSquaredWeights() {
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;           // square it
    }

说白了，queryNorm是有queryWeight 确定，但是这里多了query的boost，queryWeight 只是某一个term的boost。
BooleanQuery：

 public float sumOfSquaredWeights() throws IOException {
      float sum = 0.0f;
      for (int i = 0 ; i < weights.size(); i++) {
        BooleanClause c = (BooleanClause)clauses.get(i);
        Weight w = (Weight)weights.get(i);
        // call sumOfSquaredWeights for all clauses in case of side effects
        float s = w.sumOfSquaredWeights();         // sum sub weights
        if (!c.isProhibited())
          // only add to sum for non-prohibited clauses
          sum += s;
      }

      sum *= getBoost() * getBoost();             // boost each sub-weight

      return sum ;
    }

注意 BooleanQuery的boost是整个query的boost。

到这里，计算得分计算的变量都确定了，其实只是三个变量，idf，term的boost，query的boost。所以其实我们一般扩展也就是改这三个变量的值。

查看图片附件

分享到：

lucene2.4源码学习9 搜索 norm | lucene2.4源码学习7 构建查询树 rewrite

2013-05-12 11:42
浏览 1460
评论(0)
论坛回复 / 浏览 (0 / 1975)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论