- 浏览: 321442 次
- 性别:
- 来自: 上海
文章分类
最新评论
-
gongfengying:
你这东西直接拷贝过来,还一大堆注释的代码,而且没有解释说明,怎 ...
Nginx图片服务器 -
wiwi1024:
PerfectHand 写道为什么没有 sql那一部分呢 求s ...
Ibatis批量插入数据 -
wenxin2009:
sql那部分就跟平常sql一样,一个update语句
Ibatis批量插入数据 -
PerfectHand:
为什么没有 sql那一部分呢 求sql 那一部分
Ibatis批量插入数据 -
heliang0915:
有源代码吗?麻烦上传一下,谢谢了!
Spring MVC与Mongodb整合开发实例
Lucene3.5学习总结:
Lucene主要分为两大块:索引和搜索。相关包可能官网上下载。
官方网为:http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/index.html
索引分为文件索引和内存索引,下面介绍的是文件索引。包括新建、删除、更新、读取索引。索引中文分词可以研究下IKAnalyzer。
import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.NumericField; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.Term; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; /** * 索引 * * @author * @version v 0.1 2012-3-6 上午10:50:25 */ public class Index { /** 索引文件路径 */ private static final String INDEX_PATH = "/workspace2/indexing"; /** 编号 */ public static final String AUCTION_NO = "auctionNo"; /** 名称 */ public static final String AUCTION_NAME = "auctionName"; /** 价格 */ public static final String MAX_PRICE = "maxPrice"; /** 日期 */ public static final String END_DATE = "endDate"; /** * 程序入口 * @param args */ public static void main(String[] args) { createIndex(); // deleteIndex(); // readIndex(); updateIndex(); } /** * 创建索引 */ public static void createIndex(){ try { Directory directory = FSDirectory.open(new File(INDEX_PATH)); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35); IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_35,analyzer); IndexWriter writer = new IndexWriter(directory, iwConfig); //Document添加索引值 Document doc = new Document(); Field auctionNoField = new Field(AUCTION_NO, "10003",Field.Store.YES, Field.Index.NOT_ANALYZED); Field auctionNameField = new Field(AUCTION_NAME, "汇园果汁", Field.Store.YES, Field.Index.ANALYZED); Field endDateField = new Field(END_DATE, "2012-03-06 18:00:00",Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(auctionNoField); doc.add(auctionNameField); doc.add(new NumericField(Index.MAX_PRICE, Field.Store.YES, true).setDoubleValue(300)); doc.add(endDateField); writer.addDocument(doc); // writer.addDocuments(docs);//添加多个索引 writer.close(); System.out.println("=========创建索引完成"); } catch (IOException e) { e.printStackTrace(); } } /** * IndexWriter方式删除索引 * <ul> * <li>全部删除</li> * <li>单个或多个删除</li> * </ul> */ public static void indexWriterDeleteIndex(){ try { Directory directory = FSDirectory.open(new File(INDEX_PATH)); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35); IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_35,analyzer); IndexWriter writer = new IndexWriter(directory, iwConfig); // writer.deleteAll();//删除所有索引 Term term = new Term(Index.AUCTION_NO, "10001");//删除单个索引 writer.deleteDocuments(term); writer.close(); System.out.println("=========删除索引成功"); } catch (IOException e) { e.printStackTrace(); } } /** * IndexReader方式删除索引 */ public static void indexReaderDeleteIndex(){ try { Directory directory = FSDirectory.open(new File(INDEX_PATH)); IndexReader reader = IndexReader.open(directory, false);//设为true为只读模式 Term term = new Term(Index.AUCTION_NO, "10000");//删除单个索引 reader.deleteDocuments(term); reader.flush(); reader.close(); } catch (IOException e) { e.printStackTrace(); } } /** * 更新索引(先删除再创建) * <ul> * <li>更新单个索引</li> * <li>更新多个索引</li> * </ul> */ public static void updateIndex(){ try { Directory directory = FSDirectory.open(new File(INDEX_PATH)); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35); IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_35,analyzer); IndexWriter writer = new IndexWriter(directory, iwConfig); //Document添加索引值 Document doc = new Document(); Field auctionNoField = new Field(AUCTION_NO, "10003",Field.Store.YES, Field.Index.NOT_ANALYZED); Field auctionNameField = new Field(AUCTION_NAME, "商品名称", Field.Store.YES, Field.Index.ANALYZED); Field endDateField = new Field(END_DATE, "2011-03-06 18:00:00",Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(auctionNoField); doc.add(auctionNameField); doc.add(new NumericField(Index.MAX_PRICE, Field.Store.YES, true).setDoubleValue(200)); doc.add(endDateField); //根据唯一商品ID进行更新索引 Term term = new Term(Index.AUCTION_NO,"10003"); writer.updateDocument(term, doc);//更新索引 // writer.updateDocuments(delTerm, docs);//更新多个索引 writer.close(); System.out.println("=========更新索引完成"); } catch (IOException e) { e.printStackTrace(); } } /** * 读索引 */ public static void readIndex(){ try { Directory directory = FSDirectory.open(new File(INDEX_PATH)); IndexReader reader = IndexReader.open(directory, true);//设为true为只读模式 int num = reader.numDocs(); for (int i = 0; i < num; i++) { Document doc = reader.document(i); System.out.println(doc); } reader.close(); } catch (IOException e) { e.printStackTrace(); } } } Lucene查询有很多种,下面介绍了一些常用的查询 Lucene查询语法请参考:http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.html 下面是搜索代码: import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.MultiFieldQueryParser; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.FuzzyQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MultiPhraseQuery; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.PhraseQuery; import org.apache.lucene.search.PrefixQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TermRangeQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.TopScoreDocCollector; import org.apache.lucene.search.WildcardQuery; import org.apache.lucene.search.highlight.Highlighter; import org.apache.lucene.search.highlight.InvalidTokenOffsetsException; import org.apache.lucene.search.highlight.QueryScorer; import org.apache.lucene.search.highlight.SimpleFragmenter; import org.apache.lucene.search.highlight.SimpleHTMLFormatter; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import com.index.Index; /** * * * @author * @version v 0.1 2012-3-6 下午01:33:16 */ public class Search { public IndexSearcher searcher = null; String keyword = "北 AND 要"; String keyword2 = "100"; /** * 程序入口 * @param args */ public static void main(String[] args) { Search search = new Search(); search.getSearcher(); // search.termQuery();//词条查询 search.booleanQuery_1(); // search.booleanQuery();//组合查询 // search.wildcardQuery();//通配符查询 // search.phraseQuery();//短语查询 // search.prefixQuery();//前缀查询 // search.multiPhraseQuery();//多短语查询- // search.fuzzyQuery();//模糊查询 // search.termRangeQuery();//文本范围查询 2011-03-06 18:00:00 TO 2012-03-06 18:00:00 // search.numericRangeQuery(100.00,200.00);//数字范围查询 // search.sortQuery();//排序查询 // search.heightQuery();//高亮查询 // search.pageQuery(2, 5);//分页查询 } /** * 获得搜索 */ public void getSearcher(){ IndexReader reader = null; try { reader = IndexReader.open(FSDirectory.open(new File("/workspace2/indexing")), true); searcher = new IndexSearcher(reader); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } /** * 查询 * @param q */ public void query(Query q){ TopScoreDocCollector collector = TopScoreDocCollector.create(5*10, false); try { searcher.search(q, collector); int count = collector.getTotalHits(); System.out.println("------------获得 "+count+" 记录!"); TopDocs top = collector.topDocs(); ScoreDoc[] docs = top.scoreDocs; for (ScoreDoc sd : docs) { Document doc = searcher.doc(sd.doc); System.out.println(doc.get(Index.AUCTION_NO)+" , "+doc.get(Index.AUCTION_NAME)+" , "+doc.get(Index.MAX_PRICE)+" , "+doc.get(Index.END_DATE)); } searcher.close();//关闭搜索 } catch (IOException e) { e.printStackTrace(); } } /** * 排序查询 * @param q */ public void querySort(Query q,Sort sort){ System.out.println("==============排序搜索"); TopScoreDocCollector collector = TopScoreDocCollector.create(5*10, false); try { searcher.search(q,1000, sort); int count = collector.getTotalHits(); System.out.println("------------获得 "+count+" 记录!"); TopDocs top = collector.topDocs(); ScoreDoc[] docs = top.scoreDocs; for (ScoreDoc sd : docs) { Document doc = searcher.doc(sd.doc); System.out.println(doc.get(Index.AUCTION_NO)+" , "+doc.get(Index.AUCTION_NAME)+" , "+doc.get(Index.MAX_PRICE)+" , "+doc.get(Index.END_DATE)); } searcher.close(); } catch (IOException e) { e.printStackTrace(); } } /** * 词条搜索 TermQuery */ public void termQuery(){ Term t = new Term(Index.AUCTION_NAME, keyword); TermQuery q = new TermQuery(t); System.out.println("=====词条搜索====="); query(q); } /** * MultiTermQuery */ public void multiTermQuery(){ } public void booleanQuery_1() { BooleanQuery q = new BooleanQuery(); QueryParser parser = new QueryParser(Version.LUCENE_35, Index.AUCTION_NAME, new StandardAnalyzer(Version.LUCENE_35)); try { Query query = parser.parse(keyword); TermQuery termQuery = new TermQuery(new Term(Index.AUCTION_NAME, keyword)); q.add(query, BooleanClause.Occur.SHOULD); System.out.println("q : " + q.toString()); System.out.println("========= 组合搜索"); query(q); } catch (ParseException e) { e.printStackTrace(); } } /** * 组合搜索 BooleanQuery * MUST_NOT :不包含 * SHOULD :表或关系 * MUST :表并关系 */ public void booleanQuery(){ BooleanQuery q = new BooleanQuery(); String[] s = keyword.split(" "); if (s.length > 0) { for (int i = 0; i < s.length; i++) { // TermQuery termQuery = new TermQuery(new Term(Index.AUCTION_NAME,s[i])); if (s[i].indexOf("-") != -1) { String query = s[i].replaceAll("-", " NOT "); TermQuery termQuery = new TermQuery(new Term(Index.AUCTION_NAME,query)); q.add(termQuery,BooleanClause.Occur.MUST_NOT); }else{ TermQuery termQuery = new TermQuery(new Term(Index.AUCTION_NAME,s[i])); q.add(termQuery, BooleanClause.Occur.SHOULD); } } }else{ TermQuery termQuery = new TermQuery(new Term(Index.AUCTION_NAME,keyword)); q.add(termQuery,BooleanClause.Occur.SHOULD); } System.out.println("q : "+q.toString()); System.out.println("========= 组合搜索"); query(q); } /** * * 通配符搜索 WildcardQuery * ?* */ public void wildcardQuery(){ Term t = new Term(Index.AUCTION_NAME, keyword); WildcardQuery q = new WildcardQuery(t); System.out.println(q.toString()); System.out.println("=======通配符搜索"); query(q); } /** * 短语搜索 PhraseQuery */ public void phraseQuery(){ PhraseQuery q = new PhraseQuery(); q.add(new Term(Index.AUCTION_NAME,keyword)); q.add(new Term(Index.AUCTION_NAME, keyword2)); q.setSlop(10);//设置坡度,默认为0。两个关键字之间的字符数量 System.out.println("=======短语搜索"); query(q); } /** * 前缀搜索 PrefixQuery */ public void prefixQuery(){ Term term = new Term(Index.AUCTION_NAME, keyword); PrefixQuery q = new PrefixQuery(term); System.out.println("==========前缀搜索"); query(q); } /** * 多短语搜索 MultiPhraseQuery */ public void multiPhraseQuery(){ Term[] terms = new Term[]{new Term(Index.AUCTION_NAME, keyword),new Term(Index.AUCTION_NAME,keyword2)}; MultiPhraseQuery q = new MultiPhraseQuery(); q.add(terms); q.setSlop(0);//设置坡度,默认为0。两个关键字之间的字符数量 System.out.println("==========多短语搜索"); query(q); } /** * 模糊搜索 FuzzyQuery */ public void fuzzyQuery(){ Term term = new Term(Index.AUCTION_NAME, keyword); FuzzyQuery q = new FuzzyQuery(term); //默认匹配度为0.5,当该值越小,模糊匹配度越低 // FuzzyQuery q = new FuzzyQuery(term, 0.1f); System.out.println("q:"+q.toString()); System.out.println("=======模糊搜索"); query(q); } /** * 文本范围搜索 TermRangeQuery * 后面两个参数分别为是否包含前边界和后边界 */ public void termRangeQuery(){ TermRangeQuery q = new TermRangeQuery(Index.END_DATE, keyword, keyword2, true, false); System.out.println("===========范围搜索"); query(q); } /** * 数字范围搜索 NumericRangeQuery * 后面两个参数分别为是否包含前边界和后边界 */ public void numericRangeQuery(double start,double end){ Query q = NumericRangeQuery.newDoubleRange(Index.MAX_PRICE, start, end, true, true); System.out.println("===========数字范围搜索"); query(q); } /** * 跨度查询 SpanQuery */ public void spanQuery(){ } /** * 排序搜索(根据拍品名称按价格排序) */ public void sortQuery(){ try { QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_35, new String[]{Index.AUCTION_NAME}, new StandardAnalyzer(Version.LUCENE_35)); Query q = parser.parse(keyword); Sort sort = new Sort(); sort.setSort(new SortField(Index.MAX_PRICE, SortField.DOUBLE, false));//true为降序,false为升序 ScoreDoc[] hits = searcher.search(q, null, Integer.MAX_VALUE, sort).scoreDocs; System.out.println(hits.length); for (int i = 0; i < hits.length; i++) { Document doc = searcher.doc(hits[i].doc); System.out.println(doc.get(Index.AUCTION_NAME)+" , "+doc.get(Index.MAX_PRICE)); } searcher.close(); } catch (ParseException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } /** * 高亮显示搜索 * Lucene高亮与solr高亮有些不一样,Lucene是先查询出结果再设置高亮, * 而solr是先设置高亮再查询,直接得到高亮内容 */ public void heightQuery(){ Term t = new Term(Index.AUCTION_NAME, keyword); TermQuery q = new TermQuery(t); TopScoreDocCollector collector = TopScoreDocCollector.create(5*10, false); try { searcher.search(q, collector); int count = collector.getTotalHits(); System.out.println("------------获得 "+count+" 记录!"); TopDocs top = collector.topDocs(); ScoreDoc[] docs = top.scoreDocs; for (ScoreDoc sd : docs) { Document doc = searcher.doc(sd.doc); String auctionName = doc.get(Index.AUCTION_NAME); SimpleHTMLFormatter shf = new SimpleHTMLFormatter("<span style='color:red'>", "</span>"); Highlighter highlighter = new Highlighter(shf, new QueryScorer(q)); highlighter.setTextFragmenter(new SimpleFragmenter(Integer.MAX_VALUE)); String content = highlighter.getBestFragment(new StandardAnalyzer(Version.LUCENE_35), Index.AUCTION_NAME, auctionName); System.out.println(doc.get(Index.AUCTION_NO)+" , "+content+" , "+doc.get(Index.MAX_PRICE)+" , "+doc.get(Index.END_DATE)); } } catch (IOException e) { e.printStackTrace(); } catch (InvalidTokenOffsetsException e) { e.printStackTrace(); } } /** * 分页查询 * * @param start * @param howMany */ public void pageQuery(int start, int howMany){ Term t = new Term(Index.AUCTION_NAME, keyword); TermQuery q = new TermQuery(t); System.out.println("=============分页搜索"); this.doPageSearch(q, start, howMany); } /** * 分页 * */ public void doPageSearch(Query q, int start, int howMany){ TopScoreDocCollector collector = TopScoreDocCollector.create(start+howMany, false); try { searcher.search(q, collector); int count = collector.getTotalHits(); System.out.println("------------获得 "+count+" 记录!"); TopDocs top = collector.topDocs(start, howMany); ScoreDoc[] docs = top.scoreDocs; for (ScoreDoc sd : docs) { Document doc = searcher.doc(sd.doc); System.out.println(doc.get(Index.AUCTION_NO)+","+doc.get(Index.AUCTION_NAME)); } } catch (IOException e) { e.printStackTrace(); } } }
相关推荐
**Lucene学习总结** 在深入理解Lucene之前,我们首先需要了解什么是全文检索。全文检索是一种从大量文本数据中快速查找所需信息的技术。它通过建立索引来实现高效的搜索,而Lucene正是Java环境下最著名的全文搜索...
24 Lucene学习总结之八:Lucene的查询语法,JavaCC及QueryParser(1)
lucene是一个全文搜索框架,它提供接口,由用户自由实现。 本资源为对lucene的学习+收集
Lucene学习总结之一:全文检索的基本原理 Lucene学习总结之二:Lucene的总体架构 Lucene学习总结之三:Lucene的索引文件格式(1) Lucene学习总结之三:Lucene的索引文件格式(2) Lucene学习总结之三:Lucene的...
Lucene 是一个高性能、全文本搜索库,由 Apache 软件基金会开发。它提供了基本的索引和搜索功能,可以被嵌入到各种应用程序中,实现高效的全文检索。本篇文章将深入探讨 Lucene 的核心原理,从全文检索的基础概念...
在全文检索中,Lucene是一个关键的工具,它是一个高效的、基于Java的全文检索库。全文检索主要用于处理非结构化数据,如邮件、文档等,这些数据无法像结构化数据(如数据库记录)那样通过简单的SQL查询进行快速检索...
在Lucene学习总结之二中,我们看到,Lucene的核心组件包括Analyzer(分析器)、Document(文档对象)、Field(字段)、IndexWriter(索引写入器)和Searcher(搜索器),它们协同工作以实现高效的信息检索。...
所提供的文档资源,如《Lucene学习总结之一》、《传智播客Lucene3.0课程》、《JAVA_Lucene_in_Action教程完整版》以及《Lucene_in_Action(中文版)》,都是深入了解 Lucene 的宝贵资料,建议结合这些材料进行系统...
本文将主要围绕Java Lucene进行深入探讨,并基于提供的“Lucene学习源码.rar”文件中的“Lucene视频教程_讲解部分源码”展开讨论。 一、Lucene核心概念 1. 文档(Document):Lucene中的基本单位,用于存储待检索...
在"lucene总结.chm"文件中,可能包含了网友们对Lucene的详细学习笔记和实践心得。CHM是Microsoft编写的帮助文件格式,通常包含HTML页面和相关资源,便于用户查阅和学习。在这里,读者可以找到关于Lucene的API用法、...
前段时间学习lucene的总结,因为版本已经更新到了3.2,一些API有更新,很多现有的网页上找到的教程都跑不了。
### Lucene 学习全方面剖析总结 #### Lucene 原理与应用概述 Lucene 是一个高性能、全文检索的开源库,被广泛应用于各种搜索引擎的开发之中。本篇文章旨在全面剖析 Lucene 的核心技术和应用场景,帮助读者深入理解...
**Lucene学习工具包** Lucene是一个开源的全文搜索引擎库,由Apache软件基金会开发并维护。这个"Lucene学习工具包.zip"包含了学习Lucene所需的重要资料和资源,旨在帮助开发者深入理解和掌握Lucene的核心概念、功能...
**标题:“Lucene5学习之SpellCheck拼写纠错”** 在深入探讨Lucene5的SpellCheck功能之前,首先需要理解Lucene是什么。Lucene是一个开源的全文检索库,由Apache软件基金会开发,它提供了高性能、可扩展的文本搜索...
### Lucene基础知识总结 #### 一、Lucene简介与核心概念 **Lucene**是一款高性能、全文搜索引擎库,由Java...以上内容总结了在学习和使用Lucene过程中的关键知识点,希望能帮助读者更好地理解和应用Lucene的技术。
总结起来,Lucene5学习之增量索引(Zoie)涉及到的关键技术点包括: 1. 基于Lucene的增量索引解决方案:Zoie系统。 2. 主从复制架构:Index Provider和Index User的角色。 3. 数据变更追踪:通过变更日志实现增量索引...
标题:Lucene学习笔记 描述:Lucene学习笔记,Lucene入门必备材料 知识点: 一、Lucene概述与文档管理策略 Lucene是一款高性能、全功能的文本搜索引擎库,广泛应用于文档检索、全文搜索等场景。为了提升搜索效率...