Lucene实战Demo

sysout091

浏览: 1078 次
性别:
来自: 南京

最近访客更多访客>>

wangqifeng000

woodding2008

pro

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Lucene

lucene 文本查找 Demo

最近在学习Lucene，官方版本已经更新至5.0，网址：http://lucene.apache.org/

Lucene官网写道

The Apache LuceneTM project develops open-source search software, including:

1.Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
2.Solr， is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.
3.Open Relevance Project， is a subproject with the aim of collecting and distributing free materials for relevance testing and performance.
4.PyLucene， is a Python port of the Core project.

Lucene Core是最核心的内容。

provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

Lucene 是基于Java基础的索引和搜索技术，包括拼写检查，高亮显示和高端的分词能力。它不是一个完全的应用，而是提供了一项能力，一种技术，JAVA搜索技术的一个解决方案。

Lucene的技术我就不详细介绍了，百度百科上有.

接下来我为大家介绍，我使用Lucene设计和开发了一个简易文本文件搜索的Demo，主要包括以下几个方面：

环境准备

使用到的jar包：见附件。

主体思路

1.获取需要被索引的数据

File[] files = files2Index.listFiles(new FilenameFilter()
		{

			@Override
			public boolean accept(File dir, String name)
			{
				return name.endsWith("txt");
			}
		});

2.使用Lucene创建索引

IndexWriter indexWriter = getIndexWriter();

			BufferedReader br = null;
			String line = null;
			StringBuilder sb = null;

			for (File file : files)
			{
				// 创建txt文件的索引，包括名称和内容
				Document doc = new Document();
				doc.add(new TextField(NAME, file.getName(), Store.YES));

				try
				{
					br = new BufferedReader(new FileReader(file));
					sb = new StringBuilder();
					while ((line = br.readLine()) != null)
					{
						sb.append(line);
					}

					doc.add(new TextField(CONTENT, sb.toString(), Store.YES));
					indexWriter.addDocument(doc);

					indexWriter.commit();
					br.close();
				} catch (FileNotFoundException e)
				{
					// TODO Auto-generated catch block
					// log
				} catch (IOException e)
				{
					// TODO Auto-generated catch block
					// log
				}
			}

/**
	 * 索引写入类
	 * 
	 * @return
	 */
	private IndexWriter getIndexWriter()
	{
		IndexWriter indexWriter = null;
		try
		{
			indexWriter = new IndexWriter(FSDirectory.open(indexDir.toPath()),
					new IndexWriterConfig(new SmartChineseAnalyzer()));
		} catch (IOException e)
		{
			// TODO Auto-generated catch block
			// log
		}
		return indexWriter;
	}

3.根据索引查询内容

public List<String> getFoundFileNames(String queryContent)
	{
		ScoreDoc[] scoreDocs = queryIndex(queryContent);

		List<String> results = new ArrayList<String>();
		Set<String> fields = new HashSet<String>();
		fields.add(NAME);
		fields.add(CONTENT);

		for (ScoreDoc scDoc : scoreDocs)
		{
			try
			{
				Document resDoc = indexSearcher.doc(scDoc.doc, fields);
				results.add(resDoc.getValues(NAME)[0]);
			} catch (IOException e)
			{
				// TODO Auto-generated catch block
				// log
			}
		}
		return results;
	}
private ScoreDoc[] queryIndex(String queryContent)
	{
		try
		{
			// 索引搜索
			indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory
					.open(indexDir.toPath())));

			// 查询内容转换器
			QueryParser parser = new QueryParser("", new SmartChineseAnalyzer());

			return indexSearcher.search(parser.parse(queryContent), MAX_COUNT).scoreDocs;
		} catch (IOException e)
		{
			// TODO Auto-generated catch block
			// log
		} catch (ParseException e)
		{
			// TODO Auto-generated catch block
			// log
		}
		return null;
	}

PS：首先需要在对应的目录下面创建一些TXT文件，索引目录如果不存在会自动创建文件夹
测试代码：

 	@Test
	public void test01()
	{
		IndexFile file = new IndexFile(new File("E:\\APP\\luceneTest\\文本文件"),
				new File("E:\\APP\\luceneTest\\indexs\\01"));
		file.createIndex();
		System.out.println("01:" + file.getFoundFileNames("NAME:\"文本\""));
		System.out.println("01:" + file.getFoundFileNames("NAME:\"txt\""));
	}

	@Test
	public void test02()
	{
		IndexFile file = new IndexFile(new File("E:\\APP\\luceneTest\\文本文件"),
				new File("E:\\APP\\luceneTest\\indexs\\02"));
		file.createIndex();
		System.out.println("02:" + file.getFoundFileNames("CONTENT:\"我\""));
		System.out.println("02:" + file.getFoundFileNames("CONTENT:\"开发\""));
	}

lib.zip (7 MB)
下载次数: 4

src.zip (1.9 KB)
下载次数: 5

分享到：

搭建NodeJS框架（Windows）

2015-03-13 00:24
浏览 600
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论