lucene3.0 学习总结 -

randychao2008

浏览: 16108 次
性别:
来自: 武汉

最近访客更多访客>>

anmo

agen56899

LCCYTY

新竹623

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

lucene3.0 学习总结

lucene Apache 搜索引擎 junit

近日，一直在学习lucene。应用lucene3.0编写了一段简单的代码，实现了搜索引擎最基本的功能：索引和检索。但在进行检索的时候出现了问题：在把field加入到segment后（doc1.add(new Field("content","matter bird scan cancer scan matter",Store.YES,Index.NOT_ANALYZED));），明明有两个"scan"，可是在检索的时候返回的hits只有一个，请问为什么？
哪位好心人能帮我解决一下，我是个初学者，望多指教，在此拜过。

package document;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import org.junit.Test;

public class fileDoc {
private Directory index_dir = null;
private String index_path = "c:\\index";
private IndexWriter write = null;
private IndexSearcher indexSearch = null;
public fileDoc() throws CorruptIndexException, LockObtainFailedException, IOException{
File file = new File(index_path);
index_dir = SimpleFSDirectory.open(file);//一般用SimpleFSDirectory创建索引目录
write = new IndexWriter(index_dir,new StandardAnalyzer(Version.LUCENE_30),true,MaxFieldLength.LIMITED);
indexSearch = new IndexSearcher(index_dir);
}

@Test
public void createIndex() throws CorruptIndexException, IOException{
//往哪儿建立索引
Document doc1 = new Document();
doc1.add(new Field("content","matter bird scan cancer scan matter",Store.YES,Index.NOT_ANALYZED));
doc1.add(new Field("title","doca",Store.YES,Index.ANALYZED));

write.addDocument(doc1);

write.close();//只有调用了close方法后，索引器才会将存放于内存中的所有内容写入磁盘，并关闭输出流。否则索引目录下将只有segment文件

}
@Test
public void search() throws CorruptIndexException, IOException, ParseException{
//生成Query对象
Query query = new BooleanQuery();
String key1 = "scan";

//检索
QueryParser queryParser = new QueryParser(Version.LUCENE_30, "content", new StandardAnalyzer(Version.LUCENE_30));
query = queryParser.parse(key1);
TopScoreDocCollector collector = TopScoreDocCollector.create(100, true);//有序排列
indexSearch.search(query, collector);
System.out.println("查询到该关键词有:"+collector.getTotalHits()+"个");

}
}

2
顶

3
踩

分享到：

Lucene3.0 学习笔记（2）

2011-02-28 16:05
浏览 1256
评论(2)
分类:编程语言
查看更多

2 楼 randychao2008 2011-03-01

谢谢1楼的回复，这个问题我想明白了。

如果我想返回每个索引文档内包含的某个单词的个数，改怎么做呢，lucene3.0 API提供有这样的类吗？

1 楼 hu437 2011-02-28

他返回的hits是按文档来计算的

你doc1.add(new Field("content","matter bird scan cancer scan matter",Store.YES,Index.NOT_ANALYZED)); 这只是一个文档，所以当然返回的是一个hits

你这个里面是有两个scan 但是这个分词后叫作词元(terms)

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene3.0 学习总结

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene3.0 学习总结

评论

发表评论

相关推荐

最近访客更多访客>>