转 Lucene之Hello world -

zwwko

浏览: 14601 次
性别:
来自: 杭州

最近访客更多访客>>

seasoneye

woodding2008

InJavaWeTrust

pro

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

转 Lucene之Hello world

lucene Java 搜索引擎 Apache Eclipse

首先，确认要建立索引的文件。在E:\lucene\test下放置所有要建立索引的文件。
a.txt b.txt c.txt d.txt 内容如图：

选择开发工具和开发包开发工具
开发工具
Eclipse 3.2
开发包
    lucene-demos-1.9-final.jar
    lucene-core-1.9-final.jar
4.6 Lucene实例开发
打开Eclipse，新建一个Java工程，工程有3个类，其中
Constants.java主要是用来存储一些常量的类，如索引文件路径和索引的存放位置；
LuceneIndex.java是用于对文件建闰索引的类;
LuceneSearch.java则是用于检索索引的类。
另外，工程还引入开发包lucene-demos-1.9-final.jar lucene-core-1.9-final.jar
4.6.1 建立索引LuceneIndex.java
Constants.java创建
package test;
public class Constants {
    public final static String INDEX_FILE_PATH = "e:\\lucene\\test"; //索引的文件的存放路径
    public final static String INDEX_STORE_PATH = "e:\\lucene\\index"; //索引的存放位置
}

LuceneIndex.java创建
package test;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;

public class LuceneIndex {
    public static void main(String[] args) throws Exception {
        // 声明一个对象
        LuceneIndex indexer = new LuceneIndex();
        // 建立索引
        Date start = new Date();
        indexer.writeToIndex();
        Date end = new Date();

        System.out.println("建立索引用时" + (end.getTime() - start.getTime()) + "毫秒");

        indexer.close();
    }

    public LuceneIndex() {
        try {
            writer = new IndexWriter(Constants.INDEX_STORE_PATH,
                    new StandardAnalyzer(), true);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    // 索引器
    private IndexWriter writer = null;

    // 将要建立索引的文件构造成一个Document对象，并添加一个域"content"
    private Document getDocument(File f) throws Exception {
        Document doc = new Document();

        FileInputStream is = new FileInputStream(f);
        Reader reader = new BufferedReader(new InputStreamReader(is));
        doc.add(Field.Text("contents", reader));

        doc.add(Field.Keyword("path", f.getAbsolutePath()));
        return doc;
    }

    public void writeToIndex() throws Exception {
        File folder = new File(Constants.INDEX_FILE_PATH);
        if (folder.isDirectory()) {
            String[] files = folder.list();
            for (int i = 0; i < files.length; i++) {
                File file = new File(folder, files[i]);
                Document doc = getDocument(file);
                System.out.println("正在建立索引 : " + file + "");
                writer.addDocument(doc);
            }
        }
    }

    public void close() throws Exception {
        writer.close();
    }

}
4.6.2 建立搜索LuceneSearch.java

LuceneSearch.java创建
package test;

import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

public class LuceneSearch {

    public static void main(String[] args) throws Exception {
        LuceneSearch test = new LuceneSearch();
        Hits h = null;
        h = test.search("中国");
        test.printResult(h);
        h = test.search("人民");
        test.printResult(h);
        h = test.search("共和国");
        test.printResult(h);
    }

    public LuceneSearch() {
        try {
            searcher = new IndexSearcher(IndexReader
                    .open(Constants.INDEX_STORE_PATH));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    // 声明一个IndexSearcher对象
    private IndexSearcher searcher = null;

    // 声明一个Query对象
    private Query query = null;

    public final Hits search(String keyword) {
        System.out.println("正在检索关键字 : " + keyword);
        try {
            // 将关键字包装成Query对象
            query = QueryParser.parse(keyword, "contents",
                    new StandardAnalyzer());

            Date start = new Date();
            Hits hits = searcher.search(query);
            Date end = new Date();
            System.out.println("检索完成，用时" + (end.getTime() - start.getTime())
                    + "毫秒");
            return hits;
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }

    public void printResult(Hits h) {
        if (h.length() == 0) {
            System.out.println("对不起，没有找到您要的结果。");
        } else {
            for (int i = 0; i < h.length(); i++) {
                try {
                    Document doc = h.doc(i);
                    System.out.print("这是第" + i + "个检索到的结果，文件名为：");
                    System.out.println(doc.get("path"));
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
        System.out.println("--------------------------");
    }

}
4.6.3 结果分析
运行LuceneIndex.java
控制区打印结果如下：
正在建立索引 : e:\lucene\test\a.txt
正在建立索引 : e:\lucene\test\b.txt
正在建立索引 : e:\lucene\test\c.txt
正在建立索引 : e:\lucene\test\d.txt
建立索引用时94毫秒

打开E:\lucene\index目录，可以看到刚才建立的索引，如图：

运行搜索
索引已经成功建立，现在分别以“中华”、“人民”，“共和国”为关键字来在索引中进行检索；
在Eclipse中运行LuceneSearch.java
可以看到控制区输出了检索结果如下：
正在检索关键字 : 中国
检索完成，用时16毫秒
这是第0个检索到的结果，文件名为：e:\lucene\test\b.txt
--------------------------
正在检索关键字 : 人民
检索完成，用时0毫秒
这是第0个检索到的结果，文件名为：e:\lucene\test\a.txt
这是第1个检索到的结果，文件名为：e:\lucene\test\c.txt
这是第2个检索到的结果，文件名为：e:\lucene\test\b.txt
--------------------------
正在检索关键字 : 人
检索完成，用时15毫秒
这是第0个检索到的结果，文件名为：e:\lucene\test\a.txt
这是第1个检索到的结果，文件名为：e:\lucene\test\c.txt
这是第2个检索到的结果，文件名为：e:\lucene\test\b.txt
--------------------------
首先，搜索是一种服务。在本例中，仅是通过一段代码来演示了API的使用。这与真正的服务性搜索还相去甚远。比如用户的界面的友好性、检索结果的显示、用户响应时间长短、关键字分析的能力等，这些都是评价一个搜索引擎好坏的参数。
其次，对于一个简单的搜索引擎来说，索引只要存放在某个特定的硬盘上就可以了。如本例中，我们使用一个目录来作为索引的存放位置。然而，如果要构建一个大型的集群化的搜索引擎，每天光日志的大小就有上百G，更不用说索引文件的大小了。很显然不可像本例中那样使用某个目录来存放，而应当采用分布式存储的方式，并利用存储网络技术进行连接。
当然，对于非专业型电子商务的网站来说，搜索只是它所提供一个特性，并非一定要构建什么大型集群化搜索引擎。