Lucene学习总结之七：Lucene搜索过程解析(4)转

xangqun

浏览: 83445 次
性别:
来自: 江西

最近访客更多访客>>

林祥纤

donchiang709

marklam

wangzff

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

索引技术

lucene Apple Blog

2.4、搜索查询对象

2.4.1.2、创建Weight对象树

BooleanQuery.createWeight(Searcher) 最终返回return new BooleanWeight(searcher)，BooleanWeight构造函数的具体实现如下：

public BooleanWeight(Searcher searcher) {

this.similarity = getSimilarity(searcher);

weights = new ArrayList<Weight>(clauses.size());

//也是一个递归的过程，沿着新的Query对象树一直到叶子节点

for (int i = 0 ; i < clauses.size(); i++) {

weights.add(clauses.get(i).getQuery().createWeight(searcher));

}

对于TermQuery的叶子节点，其TermQuery.createWeight(Searcher) 返回return new TermWeight(searcher)对象，TermWeight构造函数如下：

public TermWeight(Searcher searcher) {

this.similarity = getSimilarity(searcher);

//此处计算了idf

idfExp = similarity.idfExplain(term, searcher);

idf = idfExp.getIdf();

}

//idf的计算完全符合文档中的公式：

public IDFExplanation idfExplain(final Term term, final Searcher searcher) {

final int df = searcher.docFreq(term);

final int max = searcher.maxDoc();

final float idf = idf(df, max);

return new IDFExplanation() {

public float getIdf() {

return idf;

}};

}

public float idf(int docFreq, int numDocs) {

return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);

}

而ConstantScoreQuery.createWeight(Searcher) 除了创建ConstantScoreQuery.ConstantWeight(searcher)对象外，没有计算idf。

由此创建的Weight对象树如下：

              |                          //contents:cat^0.33333325
              |                          this$0    TermQuery (id=151)
              |                          value    0.0
              |                  modCount    2
              |                  size    2
              |------[1]    TermQuery$TermWeight (id=218)
                            idf    2.0986123
                            idfExp    Similarity$1 (id=233)
                            queryNorm    0.0
                            queryWeight    0.0
                            similarity    DefaultSimilarity (id=177)

                            //contents:foods
                            this$0    TermQuery (id=154)
                            value    0.0
                    modCount    2
                    size    2
        modCount    3
        size    3

2.4.1.3、计算Term Weight分数

(1) 首先计算sumOfSquaredWeights

按照公式：

代码如下：

float sum = weight.sumOfSquaredWeights();

//可以看出，也是一个递归的过程

public float sumOfSquaredWeights() throws IOException {

float sum = 0.0f;

for (int i = 0 ; i < weights.size(); i++) {

float s = weights.get(i).sumOfSquaredWeights();

if (!clauses.get(i).isProhibited())

sum += s;

}

sum *= getBoost() * getBoost(); //乘以query boost

return sum ;

}

对于叶子节点TermWeight来讲，其TermQuery$TermWeight.sumOfSquaredWeights()实现如下：

public float sumOfSquaredWeights() {

//计算一部分打分，idf*t.getBoost()，将来还会用到。

queryWeight = idf * getBoost();

//计算(idf*t.getBoost())^2

return queryWeight * queryWeight;

}

对于叶子节点ConstantWeight来讲，其ConstantScoreQuery$ConstantWeight.sumOfSquaredWeights() 如下：

public float sumOfSquaredWeights() {

//除了用户指定的boost以外，其他都不计算在打分内

queryWeight = getBoost();

return queryWeight * queryWeight;

}

(2) 计算queryNorm

其公式如下：

其代码如下：

public float queryNorm(float sumOfSquaredWeights) {

return (float)(1.0 / Math.sqrt(sumOfSquaredWeights));

}

(3) 将queryNorm算入打分

代码为：

weight.normalize(norm);

//又是一个递归的过程

public void normalize(float norm) {

norm *= getBoost();

for (Weight w : weights) {

w.normalize(norm);

}

其叶子节点TermWeight来讲，其TermQuery$TermWeight.normalize(float) 代码如下：

public void normalize(float queryNorm) {

this.queryNorm = queryNorm;

//原来queryWeight为idf*t.getBoost()，现在为queryNorm*idf*t.getBoost()。

queryWeight *= queryNorm;

//打分到此计算了queryNorm*idf*t.getBoost()*idf = queryNorm*idf^2*t.getBoost()部分。

value = queryWeight * idf;

}

我们知道，Lucene的打分公式整体如下，到此计算了图中，红色的部分：

转：http://forfuture1978.iteye.com/blog/632829

分享到：

Lucene学习总结之七：Lucene搜索过程解析(5 ... | Lucene学习总结之七：Lucene搜索过程解析(3 ...

2010-06-08 11:18
浏览 901
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论