lucene2.4源码学习11 查询 tf

huangyunbin

浏览: 2630148 次
性别:
来自: 广州

最近访客更多访客>>

cht的大摩托

xiaoxiaoHer

zzqfsy

为了ta

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

lucene 2.4源码学习

在termscore中计算得分的时候，用到tf的。tf就是这个term在这个文档出现的次数。

怎么用到的呢：

 public float score() {
    int f = freqs[pointer];
    float raw =                                   // compute tf(f)*weight
      f < SCORE_CACHE_SIZE                        // check cache
      ? scoreCache[f]                             // cache hit
      : getSimilarity().tf(f)*weightValue;        // cache miss

    return raw * Similarity.decodeNorm(norms[doc]); // normalize for field
  }

 public float tf(float freq) {
    return (float)Math.sqrt(freq);
  }

tf的计算很简单，就是对文档的这个term出现的次数开平方

那term出现的次数怎么得来的呢。

pointer是第n个document，
搜索初始化的时候，就会遍历所有的文档频率信息，由freqStream指向。term开始查询时，通过read方法，保存到一个数组中，int[] freqs 。下表为文档的id。

public int read(final int[] docs, final int[] freqs)
          throws IOException {
    final int length = docs.length;
    if (currentFieldOmitTf) {
      return readNoTf(docs, freqs, length);
    } else {
      int i = 0;
      while (i < length && count < df) {
        // manually inlined call to next() for speed
        final int docCode = freqStream.readVInt();
        doc += docCode >>> 1;       // shift off low bit
        if ((docCode & 1) != 0)       // if low bit is set
          freq = 1;         // freq is one
        else
          freq = freqStream.readVInt();     // else read freq
        count++;

        if (deletedDocs == null || !deletedDocs.get(doc)) {
          docs[i] = doc;
          freqs[i] = freq;
          ++i;
        }
      }
      return i;
    }
  }

这样就取到了term在某个document的出现次数。

查看图片附件

分享到：

jetty-all-server-7.6.3-SNAPSHOT.jar 的 ... | lucene2.4源码学习10 查询 coord

2013-05-13 09:15
浏览 1172
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene2.4源码学习11 查询 tf

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene2.4源码学习11 查询 tf

评论

发表评论

相关推荐

lucene2.4源码学习10 查询 coord

lucene2.4源码学习9 搜索 norm

lucene2.4源码学习8 得分计算方法 Weight的变量部分

lucene2.4源码学习7 构建查询树 rewrite

lucene2.4源码学习6 搜索 TooManyClauses

lucene2.4源码学习5 写文件之WaitQueue

lucene2.4源码学习4 写文件的脉络

lucene2.4源码学习3 写文件的装饰者 + 责任链 模式

lucene2.4源码学习2 lucene的基本文件学习

lucene2.4源码学习1

最近访客更多访客>>

lucene2.4源码学习3 写文件的装饰者 + 责任链模式