建立、维护索引：简单示例

xuganggogo

浏览: 453781 次
性别:
来自: 长沙

最近访客更多访客>>

kuang_yu

denny.zhao

lhc98

zhouxiangflying8003

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Lucene/Nutch

lucene Apache 算法 F#

5.1 最简单的能完成索引的代码片断

IndexWriter writer = new IndexWriter(“/data/index/”, new StandardAnalyzer(), true);
Document doc = new Document();
doc.add(new Field("title", "lucene introduction", Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("content", "lucene works well", Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.optimize();
writer.close();

下面我们分析一下这段代码。
首先我们创建了一个writer，并指定存放索引的目录为“/data/index”，使用的分析器为StandardAnalyzer，第三个参数说明如果已经有索引文件在索引目录下，我们将覆盖它们。
然后我们新建一个document。
我们向document添加一个field，名字是“title”，内容是“lucene introduction”，对它进行存储并索引。
再添加一个名字是“content”的field，内容是“lucene works well”，也是存储并索引。
然后我们将这个文档添加到索引中，如果有多个文档，可以重复上面的操作，创建document并添加。
添加完所有document，我们对索引进行优化，优化主要是将多个segment合并到一个，有利于提高索引速度。
随后将writer关闭，这点很重要。

对，创建索引就这么简单！
当然你可能修改上面的代码获得更具个性化的服务。

5.2 将索引直接写在内存
你需要首先创建一个RAMDirectory，并将其传给writer，代码如下：

Directory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
Document doc = new Document();
doc.add(new Field("title", "lucene introduction", Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("content", "lucene works well", Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.optimize();
writer.close();

5.3 索引文本文件
如果你想把纯文本文件索引起来，而不想自己将它们读入字符串创建field，你可以用下面的代码创建field：

Field field = new Field("content", new FileReader(file));

这里的file就是该文本文件。该构造函数实际上是读去文件内容，并对其进行索引，但不存储。

6 如何维护索引
索引的维护操作都是由IndexReader类提供。

6.1 如何删除索引
lucene提供了两种从索引中删除document的方法，一种是

void deleteDocument(int docNum)

这种方法是根据document在索引中的编号来删除，每个document加进索引后都会有个唯一编号，所以根据编号删除是一种精确删除，但是这个编号是索引的内部结构，一般我们不会知道某个文件的编号到底是几，所以用处不大。另一种是

void deleteDocuments(Term term)

这种方法实际上是首先根据参数term执行一个搜索操作，然后把搜索到的结果批量删除了。我们可以通过这个方法提供一个严格的查询条件，达到删除指定document的目的。
下面给出一个例子：

Directory dir = FSDirectory.getDirectory(PATH, false);
IndexReader reader = IndexReader.open(dir);
Term term = new Term(field, key);
reader.deleteDocuments(term);
reader.close();

6.2 如何更新索引
lucene并没有提供专门的索引更新方法，我们需要先将相应的document删除，然后再将新的document加入索引。例如：

Directory dir = FSDirectory.getDirectory(PATH, false);
IndexReader reader = IndexReader.open(dir);
Term term = new Term(“title”, “lucene introduction”);
reader.deleteDocuments(term);
reader.close();

IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
Document doc = new Document();
doc.add(new Field("title", "lucene introduction", Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("content", "lucene is funny", Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.optimize();
writer.close();

资料来源：http://www.searcher.org.cn/html/lucene/20071213/367_3.html

分享到：

Lucene的搜索 | Lucene的结构

2009-02-01 15:48
浏览 1653
评论(1)
查看更多

1 楼 xuganggogo 2009-02-01

package com.test.lucene;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Date;

import jeasy.analysis.MMAnalyzer;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiSearcher;
import org.apache.lucene.search.Query;

public   class TestLucene {
     public   static   void main(String[] args) throws Exception {
    //指明要索引文件夹的位置,这里是C盘的S文件夹下
    File fileDir =   new File( "f:\\s" );
    //这里放索引文件的位置
    File indexDir =   new File( "f:\\index" );
    //采用正向最大匹配的中文分词算法
    Analyzer analyzer = new MMAnalyzer();
    IndexWriter indexWriter =   new IndexWriter(indexDir, analyzer,true,IndexWriter.MaxFieldLength.LIMITED   );
    File[] textFiles = fileDir.listFiles();
    System.out.println("文件一共有："+ textFiles.length);
    long startTime =   new Date().getTime();
         // 增加document到索引去
         for ( int i =   0 ; i < textFiles.length; i ++ ) {
             if (textFiles[i].isFile() && textFiles[i].getName().endsWith(".txt")) {
                System.out.println( " File "   + textFiles[i].getCanonicalPath()+ " 正在被索引. " );
                String temp = FileReaderAll(textFiles[i].getCanonicalPath(),"GBK" );
                System.out.println(temp);
                Document document = new Document();
                Field FieldPath = new Field( "path" , textFiles[i].getPath(),Field.Store.YES, Field.Index.ANALYZED);
                Field FieldBody = new Field( "body" , temp, Field.Store.YES, Field.Index.ANALYZED);
                document.add(FieldPath);
                document.add(FieldBody);
                indexWriter.addDocument(document);
            }
        }
         // optimize()方法是对索引进行优化
        indexWriter.optimize();
        indexWriter.close();
        // 测试一下索引的时间
        long endTime =   new Date().getTime();
        System.out.println("这花费了"+(endTime - startTime)+"毫秒来把文档增加到索引里面去!"+fileDir.getPath());
        // 测试搜索
        search("d");
    }

     //搜索方法
     public static void search(String serchString) throws Exception {
/* 创建一个搜索，搜索刚才创建的f:\\index目录下的索引 */
IndexSearcher indexSearcher = new IndexSearcher("f:\\index");
/* 在这里我们只需要搜索一个目录 */
IndexSearcher indexSearchers[] = { indexSearcher };
/* new一个默认域是"content"，分词方法是MMAnalyzer的QueryParser类：域之后可以改变，分词方法用的是je-analysis*/
QueryParser parser = new QueryParser("content", new MMAnalyzer());
/* 字符串searchQuery用Query的语法来进行详细搜索 */
String searchQuery = "+(path:"+serchString+"* body:"+serchString+"*)";
System.out.println("searchQuery"+searchQuery);
Query query = parser.parse(searchQuery);
/* Multisearcher表示多目录搜索，在这里我们只有一个目录 */
MultiSearcher searcher = new MultiSearcher(indexSearchers);
/* 开始搜索 */
Hits h = searcher.search(query);
System.out.println("列出搜索结果…………");
for (int i = 0; i < h.length(); i++) {
/* 打印出文件里面path域里面的内容 */
System.out.println("path结果："+h.doc(i).get("path"));
System.out.println("body结果："+h.doc(i).get("body"));
}
searcher.close();
}

     public static String FileReaderAll(String FileName, String charset)throws IOException {
        BufferedReader reader = new BufferedReader( new InputStreamReader( new FileInputStream(FileName),charset));
        String line = new String();
        String temp = new String();
         while ((line = reader.readLine())!= null ){
            temp += line;
        }
        reader.close();
        return temp;
    }
}

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论