Lucene 2.4更新索引的方法(Update Index) -

snowdymy

浏览: 145295 次
性别:
来自: 上海

最近访客更多访客>>

angel20082008

五音谷

songbj

ericzhang19840

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Lucene 2.4更新索引的方法(Update Index)

博客分类：

Lucene

lucene Blog

在Lucene里面没有update方法，我查了文档，我们只能删除以前的索引，然后增加新的索引。

具体步骤是，根据关键词，比如url这个唯一的东西，找到已经存在的索引项，然后删除它，

下面是我的一个根据网页URL删除索引的方法，里面主要使用了Item里面保存的一个docId的int类型的参数
这个数字是lucene内部每个索引的顺序号，类似于rowid

@SuppressWarnings("unchecked")
public synchronized void deleteByUrl(String url) {
    synchronized (indexPath) {
      try {
        IndexReader indexReader = IndexReader.open(indexPath);
        Iterator it = searchUrl(url).iterator();
        while (it.hasNext()) {
          indexReader.deleteDocument(((LuceneItem) it.next()).getDocId());
        }
        indexReader.close();
      } catch (IOException e) {
        System.out.println(e);
      }
    }
}

/**
   * Lucene 2.4 搜索一个关键字的方法(Lucene Hits deprecated的解决方法)
   *
   * @param url
   * @return
   */
public List<Item> searchUrl(String url) {
    try {
      // 替换一些特殊字符，比如冒号等
      url = StrTools.encodeURLForLucene(url);

IndexSearcher isearcher = new IndexSearcher(indexPath);
QueryParser parser = new QueryParser(FIELD_URL, getAnalyzer());

      Query query = parser.parse(url);
      // 下面的这个方法已经不推荐使用了。
      // Hits hits = isearcher.search(query);
      // 改为如下的方式
      TopDocCollector collector = new TopDocCollector(10);
      isearcher.search(query, collector);
      ScoreDoc[] hits = collector.topDocs().scoreDocs;

      List<Item> rtn = new LinkedList<Item>();
      LuceneItem o;
      for (int i = 0; i < hits.length; i++) {
        Document doc = isearcher.doc(hits[i].doc);
        o = new LuceneItem();
        o.setDocId(hits[i].doc);
        o.setUrl(doc.get(FIELD_URL));

        o.setAuthor(doc.get(FIELD_AUTHOR));
        o.setTitle(doc.get(FIELD_TITLE));
        o.setDatetimeCreate(doc.get(FIELD_DATETIMECREATE));
        o.setBody(doc.get(FIELD_BODY));
        rtn.add(o);
      }
      isearcher.close();
      return rtn;
    } catch (Exception e) {
      e.printStackTrace();
      return null;
    }
}

然后在增加索引的地方，先调用deleteByUrl方法删除可能已经存在的数据，然后再增加数据

public synchronized void IndexSingle(Item item) {
    synchronized (indexPath) {
      try {
        // 先删除以前的数据
        deleteByUrl(item.getUrl());

        // 增加数据
        IndexWriter writer = getIndexWriter();
        writer.setMaxFieldLength(10000000);
        Date start = new Date();
        Document doc = new Document();// 一个文档相当与表的一条记录
        doc.add(new Field(FIELD_URL, item.getUrl(), Field.Store.YES, Field.Index.ANALYZED));
        doc.add(new Field(FIELD_AUTHOR, item.getAuthor(), Field.Store.YES, Field.Index.ANALYZED));
        doc.add(new Field(FIELD_TITLE, item.getTitle(), Field.Store.YES, Field.Index.ANALYZED));
        doc.add(new Field(FIELD_DATETIMECREATE, item.getDatetimeCreate(), Field.Store.YES, Field.Index.ANALYZED));
        doc.add(new Field(FIELD_BODY, item.getBody(), Field.Store.YES, Field.Index.ANALYZED));
        writer.addDocument(doc);
        // writer.optimize();// 优化
        writer.close();// 一定要关闭，否则不能把内存中的数据写到文件
        Date end = new Date();
        System.out.println("索引建立成功！！！！" + "用时" + (end.getTime() - start.getTime()) + "毫秒");
      } catch (IOException e) {
        System.out.println(e);
      }
    }
}

评论：

你这么每次删除都要整个索引遍历一边，数据量大时效率可想而知。
删除用DeleteDocuments(Term term)
而不是DeleteDocument(int docId)

文章出自：http://hi.baidu.com/axhack/blog/item/6a101b1822940eb34bedbc64.html/cmtid/51f461036bcc7986d43f7cc6

分享到：

Java 页面表格导出Word | Lucene 2.4更新索引的方法(Update Index)

2011-06-16 23:54
浏览 1070
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Lucene 2.4更新索引的方法(Update Index)

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Lucene 2.4更新索引的方法(Update Index)

评论

发表评论

相关推荐

Lucene 2.4更新索引的方法(Update Index)

Lucene 建立索引的效率 （仍然推荐在内存中建立索引再写回）

Lucene 索引数据库（转载）

Lucene建立索引搜索入门实例

最近访客更多访客>>

Lucene 建立索引的效率（仍然推荐在内存中建立索引再写回）