- 浏览: 759173 次
- 性别:
- 来自: 杭州
-
文章分类
最新评论
-
lgh1992314:
a offset: 26b offset: 24c offse ...
java jvm字节占用空间分析 -
ls0609:
语音实现在线听书http://blog.csdn.net/ls ...
Android 语音输入API使用 -
wangli61289:
http://viralpatel-net-tutorials ...
Android 语音输入API使用 -
zxjlwt:
学习了素人派http://surenpi.com
velocity宏加载顺序 -
tt5753:
谢啦........
Lucene的IndexWriter初始化时的LockObtainFailedException的解决方法
高级搜索技术: 排序 默认排序按照相关性, public class Sort implements Serializable { /** * Represents sorting by computed relevance. Using this sort criteria returns * the same results as calling * {@link Searcher#search(Query,int) Searcher#search()}without a sort criteria, * only with slightly more overhead. */ public static final Sort RELEVANCE = new Sort(); 相关性 /** Represents sorting by index order. */ public static final Sort INDEXORDER = new Sort(SortField.FIELD_DOC); 按照索引顺序,跟相关性顺序不一样 /** Sorts by the criteria in the given SortField. */ public Sort(SortField field) { setSort(field); } 指定排序字段之后,如果排序字段相同,则按照索引顺序再进行排序。 排序字段SortField.java /** Represents sorting by document score (relevance). */ public static final SortField FIELD_SCORE = new SortField(null, SCORE); /** Represents sorting by document number (index order). */ public static final SortField FIELD_DOC = new SortField(null, DOC); private String field; private int type; // defaults to determining type dynamically private Locale locale; // defaults to "natural order" (no Locale) boolean reverse = false; // defaults to natural order private FieldCache.Parser parser; // Used for CUSTOM sort private FieldComparatorSource comparatorSource; private Object missingValue; /** Creates a sort, possibly in reverse, by terms in the given field with the * type of term values explicitly given. * @param field Name of field to sort by. Can be <code>null</code> if * <code>type</code> is SCORE or DOC. * @param type Type of values in the terms. * @param reverse True if natural order should be reversed. */ public SortField(String field, int type, boolean reverse) { initFieldType(field, type); this.reverse = reverse; } PhraseQuery.java 允许多个项对应同一个位置进行查询,同义词查询 MultiPhraseQuery.java是对PhraseQuery的进一步扩展 /** * Adds a term to the end of the query phrase. * The relative position of the term within the phrase is specified explicitly. * This allows e.g. phrases with more than one term at the same position * or phrases with gaps (e.g. in connection with stopwords). * * @param term * @param position */ public void add(Term term, int position) { if (terms.size() == 0) field = term.field(); else if (term.field() != field) throw new IllegalArgumentException("All phrase terms must be in the same field: " + term); terms.add(term); positions.add(Integer.valueOf(position)); if (position > maxPosition) maxPosition = position; } 也支持slop /** Sets the number of other words permitted between words in query phrase. If zero, then this is an exact phrase search. For larger values this works like a <code>WITHIN</code> or <code>NEAR</code> operator. <p>The slop is in fact an edit-distance, where the units correspond to moves of terms in the query phrase out of position. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. <p>More exact matches are scored higher than sloppier matches, thus search results are sorted by exactness. <p>The slop is zero by default, requiring exact matches.*/ public void setSlop(int s) { slop = s; } 实现多个域上的查询MultiFieldQueryParser.java 跨度查询SpanQuery.java,还需要返回相同项的不同位置信息 在一个域的起点查找跨度SpanFirstQuery.java,在文档的开始end个token查询某个值 /** Matches spans near the beginning of a field. * <p/> * This class is a simple extension of {@link SpanPositionRangeQuery} in that it assumes the * start to be zero and only checks the end boundary. * * * */ public class SpanFirstQuery extends SpanPositionRangeQuery { /** Construct a SpanFirstQuery matching spans in <code>match</code> whose end * position is less than or equal to <code>end</code>. */ public SpanFirstQuery(SpanQuery match, int end) { super(match, 0, end); } 彼此相邻的跨度SpanNearQuery.java /** Matches spans which are near one another. One can specify <i>slop</i>, the * maximum number of intervening unmatched positions, as well as whether * matches are required to be in-order. */ public class SpanNearQuery extends SpanQuery implements Cloneable { protected List<SpanQuery> clauses; protected int slop; protected boolean inOrder; protected String field; private boolean collectPayloads; /** Construct a SpanNearQuery. Matches spans matching a span from each * clause, with up to <code>slop</code> total unmatched positions between * them. * When <code>inOrder</code> is true, the spans from each clause * must be * ordered as in <code>clauses</code>. * @param clauses the clauses to find near each other * @param slop The slop value * @param inOrder true if order is important * */ public SpanNearQuery(SpanQuery[] clauses, int slop, boolean inOrder) { this(clauses, slop, inOrder, true); } 排序跨度交替SpanNotQuery.java /** Removes matches which overlap with another SpanQuery. */ public class SpanNotQuery extends SpanQuery implements Cloneable { private SpanQuery include; private SpanQuery exclude; 全局跨度查询SpanOrQuery.java /** Matches the union of its clauses.*/ public class SpanOrQuery extends SpanQuery implements Cloneable { private List<SpanQuery> clauses; private String field; /** Construct a SpanOrQuery merging the provided clauses. */ public SpanOrQuery(SpanQuery... clauses) { // copy clauses array into an ArrayList this.clauses = new ArrayList<SpanQuery>(clauses.length); for (int i = 0; i < clauses.length; i++) { addClause(clauses[i]); } } filter过滤器 CachingWrapperFilter.java能够将第一次查询结果缓存起来,后面可重用 QueryWrapperFilter.java可以把查询结果作为接下来的搜索的可用文档集 TermRangeFilter.java对搜索结果进一步进行过滤 /** * A Filter that restricts search results to a range of term * values in a given field. * * <p>This filter matches the documents looking for terms that fall into the * supplied range according to {@link * String#compareTo(String)}, unless a <code>Collator</code> is provided. It is not intended * for numerical ranges; use {@link NumericRangeFilter} instead. * * <p>If you construct a large number of range filters with different ranges but on the * same field, {@link FieldCacheRangeFilter} may have significantly better performance. * @since 2.9 */ public class TermRangeFilter extends MultiTermQueryWrapperFilter<TermRangeQuery> { 自定义安全过滤器,查询的文档集要在某个用户的数据空间内 ChainedFilter.java过滤器链 FilteredQuery.java 对多个索引的搜索 lucene3.6.0建议使用MultiReader.java MultiSearcher.java 多线程搜索 对多个索引进行远程搜索 /** An IndexReader which reads multiple, parallel indexes. Each index added * must have the same number of documents, but typically each contains * different fields. Each document contains the union of the fields of all * documents with the same document number. When searching, matches for a * query term are from the first index added that has the field. * * <p>This is useful, e.g., with collections that have large fields which * change rarely and small fields that change more frequently. The smaller * fields may be re-indexed in a new index and both indexes may be searched * together. * * <p><strong>Warning:</strong> It is up to you to make sure all indexes * are created and modified the same way. For example, if you add * documents to one index, you need to add the same documents in the * same order to the other indexes. <em>Failure to do so will result in * undefined behavior</em>. */ public class ParallelReader extends IndexReader { 项向量 term vector IndexReader.java /** * Return an array of term frequency vectors for the specified document. * The array contains a vector for each vectorized field in the document. * Each vector contains terms and frequencies for all terms in a given vectorized field. * If no such fields existed, the method returns null. The term vectors that are * returned may either be of type {@link TermFreqVector} * or of type {@link TermPositionVector} if * positions or offsets have been stored. * * @param docNumber document for which term frequency vectors are returned * @return array of term frequency vectors. May be null if no term vectors have been * stored for the specified document. * @throws IOException if index cannot be accessed * @see org.apache.lucene.document.Field.TermVector */ abstract public TermFreqVector[] getTermFreqVectors(int docNumber) throws IOException; 通过文档获得域对应的项向量可以计算文档之间的相似度,从而可以进行相似查询或者推荐.
发表评论
-
对字符串进行验证之前先进行规范化
2013-09-17 23:18 13968对字符串进行验证之前先进行规范化 应用系统中经常对字 ... -
使用telnet连接到基于spring的应用上执行容器中的bean的任意方法
2013-08-08 09:17 1494使用telnet连接到基于spring的应用上执行容器中 ... -
jdk7和8的一些新特性介绍
2013-07-06 16:07 10124更多ppt内容请查看:htt ... -
Lucene的IndexWriter初始化时的LockObtainFailedException的解决方法
2013-06-28 21:35 11829原文链接: http://www.javaarch.net ... -
java对于接口和抽象类的代理实现,不需要有具体实现类
2013-06-12 09:50 2966原文链接:http://www.javaarch.net/j ... -
Excel2007格式分析和XML解析
2013-06-07 09:56 10764在物料清单采购中,用到excel上传文件解析功能,不 ... -
Java EE 7中对WebSocket 1.0的支持
2013-06-05 09:27 3854原文链接:http://www.javaarch.n ... -
java QRCode生成示例
2013-06-05 09:26 1522原文链接:http://www.javaarch.n ... -
Spring Security Logout
2013-06-03 00:05 2381原文地址:http://www.javaarch.net/ ... -
Spring Security Basic Authentication
2013-06-03 00:04 1752原文地址:http://www.javaarch.net/ ... -
Spring Security Form Login
2013-06-02 16:16 2157原文地址:http://www.javaarch.net/j ... -
spring3 的restful API RequestMapping介绍
2013-06-02 14:53 1164原文链接:http://www.javaarch.net/j ... -
Java Web使用swfobject调用flex图表
2013-05-28 19:05 1137Java Web使用swfobject调用 ... -
spring使用PropertyPlaceholderConfigurer扩展来满足不同环境的参数配置
2013-05-21 15:57 3350spring使用PropertyPlaceholderCon ... -
java国际化
2013-05-20 20:57 4484java国际化 本文来自:http://www.j ... -
RSS feeds with Java
2013-05-20 20:52 1237RSS feeds with Java 原文来自:htt ... -
使用ibatis将数据库从oracle迁移到mysql的几个修改点
2013-04-29 10:40 1688我们项目在公司的大战略下需要从oracle ... -
线上机器jvm dump分析脚本
2013-04-19 10:48 2918#!/bin/sh DUMP_PIDS=`p ... -
spring3学习入门示例工程
2013-04-18 09:28 11411. github地址 https://github ... -
spring map使用annotation泛型注入问题分析
2013-04-15 13:30 8557今天在整一个spring的ioc学习demo,碰到 ...
相关推荐
《深入剖析Lucene 3.6.0:开源搜索引擎的核心技术》 Apache Lucene是一个高性能、全文本搜索库,它提供了完整的搜索引擎功能,包括索引、查询解析、排名等。在本文中,我们将深入探讨Lucene 3.6.0版本的核心特性,...
《Apache Lucene 3.6.0:搜索引擎技术的核心解析》 Apache Lucene是一个高性能、全文本搜索库,被广泛应用于各种搜索引擎的开发中。3.6.0版本是Lucene的一个重要里程碑,它提供了丰富的功能和改进,使得开发者能够...
lucene-core-3.6.0-sources 绝对可用
lucene-3.6.0 api 手册, 最新的 , lucene 是个好东东, 一直在用, 之前还在使用3.1的,发现已经到3.6了, 落后啊
### 利用Lucene实现高级搜索的关键知识点 #### Lucene简介 Lucene是Apache软件基金会下的一个开源全文检索库,提供了高性能的文本搜索能力。它不仅适用于网站的搜索功能,还可以用于任何需要文本搜索的应用场景,如...
通过以上分析,我们可以看出这个“Lucene 高级搜索项目”全面覆盖了Lucene的核心技术,从基础的索引创建到复杂的附件搜索和全文搜索,再到插件开发和Solr的使用,为学习和实践Lucene提供了丰富的素材。
lucene-core-3.6.0.jar,很好,很实用的一个包
在信息技术领域,搜索引擎技术是数据检索的重要手段,而Apache Lucene作为开源全文搜索引擎库,以其高效、灵活的特点被广泛应用于各类项目中。本篇文章将深入探讨Lucene的高级应用,结合提供的两个文档《lucene的...
在压缩包文件名称列表中,"IK和lucene"可能包含了IK Analyzer的相关文件以及不同版本的Lucene库。通常,这些文件会包括 IK Analyzer 的源码、编译后的JAR包、配置文件,以及Lucene的JAR包等。开发者可以利用这些资源...
《Lucene高级搜索进阶项目_04》 在深入探讨Lucene的高级搜索进阶项目时,我们首先需要理解Lucene的核心概念及其在信息检索中的应用。Lucene是一个高性能、全文本搜索库,它提供了丰富的搜索功能,包括布尔运算、...
Lucene是apache软件基金会4 jakarta项目组的一个子项目,是一个开放源代码的全文检索引擎工具包,但它不是一个完整的全文检索引擎,而是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎...
总结来说,Lucene 3.6.0版本提供了完整的搜索功能和强大的测试工具,对于Java开发者来说,这是一个高效且可靠的文本搜索解决方案。通过深入理解Lucene Core的组件和Test Framework,开发者能够更好地利用Lucene实现...
lucene-highlighter-3.6.0-sources
在本项目"Lucene高级搜索进阶项目_03"中,我们将深入探讨Apache Lucene这一强大的全文搜索引擎库。Lucene是Java开发的开源库,它提供了文本分析、索引和搜索功能,使得开发者能够轻松地在应用程序中实现复杂的搜索...
- lucene-core-3.6.0.jar:这是Lucene 3.6.0的核心库,包含了实现文本搜索所需的基本组件,如索引构建、查询解析和执行等。 - lucene-1.4-final.jar:这是Lucene 1.4版本的库文件,同样包含了搜索功能,但可能没有...
《解密搜索引擎技术实战 LUCENE & JAVA(第3版)》是一本深入探讨搜索引擎技术的专业书籍,由罗刚撰写。这本书主要聚焦于LUCENE和JAVA这两种技术在搜索引擎开发中的应用,为读者揭示了搜索引擎背后的复杂机制和实现...
【Lucene搜索技术】是一种基于Java的全文索引引擎工具包,它并非一个完整的全文搜索引擎,而是提供了一套用于构建全文检索应用的API。Lucene的主要目标是方便开发者将其嵌入到各种应用程序中,实现对特定数据源的...
此外,Lucene还提供了近似度评分(Similarity Scoring),根据查询词在文档中的出现频率和位置给出相关性分数,帮助用户找到最相关的搜索结果。 智能查询则涉及到更复杂的查询构造,如前缀查询(Prefix Query)、...
解密搜索引擎技术实战Lucene&Java精华版(第3版)源码 书名:解密搜索引擎技术实战Lucene&Java精华版(第3版) 作者:罗刚 等编著 出版社:电子工业出版社 关键词:Lucene solr 搜索引擎 Lucene实战 随书源码 本书随...
本篇将深入探讨如何在C#中实现Lucene的时间区间查询匹配,以及涉及的相关技术点。 首先,我们需要了解Lucene的基本操作流程,包括索引构建、查询解析和结果检索。在C#中,我们可以使用Apache.Lucene.Net库来操作...