lucene 建立文件索引和针对索引进行搜索（lucene2.2版本）

浏览 3037 次

锁定老帖子主题：lucene 建立文件索引和针对索引进行搜索（lucene2.2版本）精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者	正文
liuwei1981 等级: 性别: 文章: 158 积分: 140 来自: 太原	发表时间：2008-12-25 相关推荐: Lucene实现索引和查询的实例讲解 lucene索引文件的创建和简单的使用 luke8用于查看lucene保存的索引库数据和文档数据 indexer:使用 Apache Lucene 快速建立 CSV 文件索引 Lucene建立索引更多相关推荐入门技术 Lucene 最近因为项目需要，开始了解lucene的应用，手头有一本《Lucene In Action》，不过一用起来才发现，我现在用2.0lucene包的情况下，该书第一个示例就无法正确编译通过，找了一些资料，终于算是调试通过，算是一个好的开始吧。 1.建立索引： package demo.example.searcher; import java.io.; import java.util.; import org.apache.lucene.analysis.standard.; import org.apache.lucene.index.; import org.apache.lucene.document.; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; public class Indexer { private static Log log = LogFactory.getLog(Indexer.class); public static void main(String[] args) throws Exception { File indexDir = new File("C:\\index"); File dataDir = new File("C:\\lucene\\src"); long start = new Date().getTime(); int numIndexed = index(indexDir, dataDir); long end = new Date().getTime(); System.out.println("use:" + (end - start)); } public static int index(File indexDir, File dataDir) { int ret = 0; try { IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(), true); writer.setUseCompoundFile(false); indexDirectory(writer, dataDir); ret = writer.docCount(); writer.optimize(); writer.close(); } catch (Exception e) { e.printStackTrace(); } return ret; } public static void indexDirectory(IndexWriter writer, File dir) { try { File[] files = dir.listFiles(); for (File f : files) { if (f.isDirectory()) { indexDirectory(writer, f); } else { indexFile(writer, f); } } } catch (Exception e) { e.printStackTrace(); } } public static void indexFile(IndexWriter writer, File f) { try { System.out.println("Indexing:" + f.getCanonicalPath()); Document doc = new Document(); Reader txtReader = new FileReader(f); doc.add(new Field("contents", txtReader)); doc.add(new Field("filename", f.getCanonicalPath(), Field.Store.YES, Field.Index.UN_TOKENIZED)); writer.addDocument(doc); } catch (Exception e) { e.printStackTrace(); } } } 2.针对上面类建立的索引进行查询： package demo.example.searcher; import java.util.; import org.apache.lucene.search.; import org.apache.lucene.queryParser.; import org.apache.lucene.analysis.standard.; import org.apache.lucene.document.; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; public class Searcher { private static Log log = LogFactory.getLog(Searcher.class); public static void main(String[] args) { String indexDir = "C:\\index"; String q = "查询关键字"; search(indexDir, q); } public static void search(String indexDir, String q) { try { IndexSearcher is = new IndexSearcher(indexDir); QueryParser queryParser = new QueryParser("contents", new StandardAnalyzer()); Query query = queryParser.parse(q); long start = new Date().getTime(); Hits hits = is.search(query); long end = new Date().getTime(); System.out.println("use:" + (end - start)); for (int i = 0; i < hits.length(); i++) { Document doc = hits.doc(i); System.out.println("The right file:" + doc.get("filename")); } } catch (Exception e) { e.printStackTrace(); } } } 最后运行正常。不过在运行测试的时候发现了一个不明白的问题：在建立索引的文件都是Java类，在测试查询关键字信息的时候，中英文都很正常，但发现在java类源文件中的信息被过滤了，无法检索出来，这是怎么回事啊，lucene自动过滤类文件的注释信息么？声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

liuwei1981 等级: 性别: 文章: 158 积分: 140 来自: 太原	发表时间：2008-12-26 好像不是因为注释的关系，在java类源文件和spring与struts2的配置xml格式文件中，中文似乎都没被索引（中文处在注释区域），倒是我自己定义的几个xml，在注释中的中文被索引了，查询结果也正常，这两类xml文件最重要的区别就是spring和struts2的配置xml文件的命名空间的引入了吧，例如 <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 为什么会这样呢，难道因为这些文件的引入，以及java类源文件的定义，注释都被屏蔽索引了么？
返回顶楼	回帖地址 0 0 请登录后投票

liuwei1981 等级: 性别: 文章: 158 积分: 140 来自: 太原	发表时间：2008-12-26 在java源文件以及spring配置文件中，即使不是在注释中的中文也没有被索引
返回顶楼	回帖地址 0 0 请登录后投票

liuwei1981 等级: 性别: 文章: 158 积分: 140 来自: 太原	发表时间：2008-12-26 而在普通的文本中，中文索引建立以及查询都正常，不清楚lucene其中的工作原理
返回顶楼	回帖地址 0 0 请登录后投票

论坛首页 → 入门技术版

跳转论坛: