lucene的排序和缓存的应用

chengqianl

浏览: 53753 次
性别:
来自: 杭州

最近访客更多访客>>

ForLove_ForYOU

阿祥哥

dj78337323

donchiang709

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

lucene

lucene 算法 Apache

Lucene的排序是通过FieldComparator及其子类实现的，以StringOrdValComparator作为例子详细说明lucene的排序的基于缓存FieldCache实现。

思路：用一个数组保存某个filed字段对应的所有的document的最大的一个term。这个数组的index就是docId，值对应所有这个filed所有term的数组的index

StringOrdValComparator 类里面的
    private String[] lookup; 值为某个filed的所有的term的值

private int[] order; index为docId，值为lookup的index，表示某个document的某个field的最大的term的在lookup中的index。

排序的时候根据docId查询出order[docId]值的大小比较。

初始化
IndexSearcher# public void search(Weight weight, Filter filter, Collector collector) 方法中调用的TopFieldCollector#OneComparatorNonScoringCollector.setNextReader
调用FieldComparator#StringOrdValComparator. setNextReadercollector.setNextReader(subReaders[i], docStarts[i]);
方法中调用
FieldCache.DEFAULT.getStringIndex(reader, field)
初始化filed对应的所有的term信息和frg信息。String就从””开始读取直到读完，

算法是在FieldCacheImpl#StringIndexCache.createValue实现的
代码如下结果就是某个document的对应的某个term的最大值
@Override
    protected Object createValue(IndexReader reader, Entry entryKey)
        throws IOException {
      String field = StringHelper.intern(entryKey.field);
      final int[] retArray = new int[reader.maxDoc()];
      String[] mterms = new String[reader.maxDoc()+1];
      TermDocs termDocs = reader.termDocs();
      TermEnum termEnum = reader.terms (new Term (field));
      int t = 0; // current term number

      // an entry for documents that have no terms in this field
      // should a document with no terms be at top or bottom?
      // this puts them at the top - if it is changed, FieldDocSortedHitQueue
      // needs to change as well.
      mterms[t++] = null;

      try {
        do {
          Term term = termEnum.term();
          if (term==null || term.field() != field) break;

          // store term text
          // we expect that there is at most one term per document
          if (t >= mterms.length) throw new RuntimeException ("there are more terms than " +
                  "documents in field \"" + field + "\", but it's impossible to sort on " +
                  "tokenized fields");
          mterms[t] = term.text();

          termDocs.seek (termEnum);
          while (termDocs.next()) {
            retArray[termDocs.doc()] = t;
          }

          t++;
        } while (termEnum.next());
      } finally {
        termDocs.close();
        termEnum.close();
      }

      if (t == 0) {
        // if there are no terms, make the term array
        // have a single null entry
        mterms = new String[1];
      } else if (t < mterms.length) {
        // if there are less terms than documents,
        // trim off the dead array space
        String[] terms = new String[t];
        System.arraycopy (mterms, 0, terms, 0, t);
        mterms = terms;
      }

      StringIndex value = new StringIndex (retArray, mterms);
      return value;
    }
};

排序的实现
Lucene的查询结果是放在优先队列里面的，优先对象是通过org.apache.lucene.search.FieldComparator.StringOrdValComparator进行比较的。比较的StringOrdValComparator. Compare方法

@Override
    public int compare(int slot1, int slot2) {
      if (readerGen[slot1] == readerGen[slot2]) {
        int cmp = ords[slot1] - ords[slot2];
        if (cmp != 0) {
          return cmp;
        }
      }

      final String val1 = values[slot1];
      final String val2 = values[slot2];
      if (val1 == null) {
        if (val2 == null) {
          return 0;
        }
        return -1;
      } else if (val2 == null) {
        return 1;
      }
      return val1.compareTo(val2);
}

如果优先队列满了，则先和最低层的比较，如果大于最底层的则先替代最底层的，然后对优先队列重新排序

TopFieldCollector# OneComparatorNonScoringCollector. Collect的红色标识
@Override
    public void collect(int doc) throws IOException {
    ++totalHits;
     if (queueFull) {
        if ((reverseMul * comparator.compareBottom(doc)) <= 0) {
          // since docs are visited in doc Id order, if compare is 0, it means
          // this document is largest than anything else in the queue, and
          // therefore not competitive.
          return;
        }

        // This hit is competitive - replace bottom element in queue & adjustTop
        comparator.copy(bottom.slot, doc);
        updateBottom(doc);
        comparator.setBottom(bottom.slot);      } else {
        // Startup transient: queue hasn't gathered numHits yet
        final int slot = totalHits - 1;
        // Copy hit into queue
        comparator.copy(slot, doc);
        add(slot, doc, Float.NaN);
        if (queueFull) {
          comparator.setBottom(bottom.slot);
        }
      }
    }

分享到：

多个term查询的步骤 | DefaultSkipListReader查找docId

2010-09-13 15:00
浏览 2579
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论