Lucene和IKAnalyzer处理中文：索引、搜索实例 -

cesul

浏览: 31698 次
性别:
来自: 成都

最近访客更多访客>>

javamingming

happylzq

temfulX

lhzzxa

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (19)

社区版块

存档分类

Lucene和IKAnalyzer处理中文：索引、搜索实例

lucene IDE F#

版本：lucene3.02, IKAnalyzer3.20

检索程序(Indexer.java)实现了对给定文件夹下深度遍历txt文件经行索引。
通过实例化IndexWriter将new IKAnalyzer(false)作为第二个参数传入。
在indexFile()中，通过内部类new Field()的形式将索引字段和相应的输入加入Document中。lucene3.*的这一改进须留意。
特别的，由于处理中文，而对于indexFile()的第二个Reader参数，如果IDE的环境为utf-8，则会让IO流处理中文时得到乱码，所以这里改用InputStreamReader实现。

public class indexer {
	private File baseDir = new File("E:\\");
	private File indexDir = new File("F:\\indexDir");

	public indexer() {
		if (!this.baseDir.exists() || !this.indexDir.exists()) {
			return;
		}
	}

	public void createIndex() {
		try {
			IndexWriter writer = new IndexWriter(
					FSDirectory.open(indexDir),
					new IKAnalyzer(false), 
					true,
					IndexWriter.MaxFieldLength.LIMITED);
			indexDirectory(writer, baseDir);
			writer.optimize(); //优化合并
			writer.close();
			System.out.println("索引完毕");
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (LockObtainFailedException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	private void indexDirectory(IndexWriter writer, File dir) {
		if (!dir.exists() || !dir.isDirectory()) {
			return;
		}
		File[] files = dir.listFiles();
		for (File file : files) {
			if (file.isDirectory()) indexDirectory(writer, file);
			else indexFile(writer, file);
		}
	}

	private void indexFile(IndexWriter writer, File file) {
		if (file.isHidden() || !file.exists() || !file.canRead()) {
			return;
		}
		try {
			if (file.getCanonicalPath().endsWith(".txt")) {
				System.out.println("正在索引：" + file.getCanonicalPath());
				Document doc = new Document();
				doc.add(new Field("text", 
						new InputStreamReader(new FileInputStream(file),"GBK")));// 对文件内容索引
				doc.add(new Field("filename", 
						file.getCanonicalPath(),
						Field.Store.YES, Field.Index.ANALYZED));// 对文件名建立索引
				writer.addDocument(doc);// 调用addDocument()方法，Lucene会建立doc的索引
			}
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	public static void main(String[] args) {
		indexer lucene = new indexer();
		lucene.createIndex();
	}
}

分享到：