- 浏览: 277777 次
- 性别:
- 来自: 北京
文章分类
- 全部博客 (161)
- 【**计划】 (2)
- 【**Core Java**】 (30)
- 【**JAVA EE】 (6)
- JDBC (3)
- Hibernate专题系列 (0)
- 【**OS】 (14)
- 【**架构设计/设计模式】 (11)
- 【Hadoop】 (3)
- 【**分布式】 (9)
- 模板 (1)
- C (2)
- 常用工具 (1)
- Oracle (2)
- 【Tips】 (3)
- 【数据库】 (2)
- 玩转Ubuntu (0)
- 【计算机网络/网络编程】 (7)
- 【**Search Engine】 (21)
- 【**专题**】 (6)
- 【**Python】 (10)
- XML (1)
- 【**Open Source Framework】 (1)
- 【高级主题】 (1)
- 【存储】 (3)
- 【笔试面试】 (2)
- 【**数据结构与算法设计】 (20)
- 【其他】 (3)
- 【编程练习】 (2)
- 【待完成】 (12)
- 【工作】 (6)
- 【软件研发】 (4)
- 【**多线程多进程编程】 (5)
- 【Web Service】 (1)
- 【表达式解析/JavaCC系列】 (5)
- 【缓存系统:Memcached】 (1)
- 【Java IO/NIO】 (5)
- 【JVM运行机制及内存管理】 (7)
最新评论
-
107x:
...
python list排序 -
yuzhu223:
...
【Python基础】Python的lambda函数与排序 -
Tonyguxu:
分析查询结果的打分小于11.query=1065800715* ...
lucene打分机制的研究 -
Tonyguxu:
query=139320661963.013709 = (MA ...
lucene打分机制的研究 -
Tonyguxu:
query=10658007150.6772446 = (MA ...
lucene打分机制的研究
说明
lucene的版本是3.0.3
结构及类图
文件类存储,隐藏了实现存储的细节。
-Abstract Directory
-Abstract FSDirectory
-SimpleFSDirectory
-NIOFSDirectory
-MMapDirectory
-RAMDirectory
-FileSwitchDirectory
Directory
package org.apache.lucene.store; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.IOException; import java.io.Closeable; import org.apache.lucene.index.IndexFileNameFilter; /** A Directory is a flat list of files. Files may be written once, when they * are created. Once a file is created it may only be opened for read, or * deleted. Random access is permitted both when reading and writing. * * <p> Java's i/o APIs not used directly, but rather all i/o is * through this API. This permits things such as: <ul> * <li> implementation of RAM-based indices; * <li> implementation indices stored in a database, via JDBC; * <li> implementation of an index as a single file; * </ul> * * Directory locking is implemented by an instance of {@link * LockFactory}, and can be changed for each Directory * instance using {@link #setLockFactory}. * */ public abstract class Directory implements Closeable { volatile protected boolean isOpen = true; /** Holds the LockFactory instance (implements locking for * this Directory instance). */ protected LockFactory lockFactory; /** Returns an array of strings, one for each file in the * directory. * @throws IOException */ public abstract String[] listAll() throws IOException; /** Returns true iff a file with the given name exists. */ public abstract boolean fileExists(String name) throws IOException; /** Returns the time the named file was last modified. */ public abstract long fileModified(String name) throws IOException; /** Set the modified time of an existing file to now. */ public abstract void touchFile(String name) throws IOException; /** Removes an existing file in the directory. */ public abstract void deleteFile(String name) throws IOException; /** Returns the length of a file in the directory. */ public abstract long fileLength(String name) throws IOException; /** Creates a new, empty file in the directory with the given name. Returns a stream writing this file. */ public abstract IndexOutput createOutput(String name) throws IOException; /** Ensure that any writes to this file are moved to * stable storage. Lucene uses this to properly commit * changes to the index, to prevent a machine/OS crash * from corrupting the index. */ public void sync(String name) throws IOException {} /** Returns a stream reading an existing file. */ public abstract IndexInput openInput(String name) throws IOException; /** Returns a stream reading an existing file, with the * specified read buffer size. The particular Directory * implementation may ignore the buffer size. Currently * the only Directory implementations that respect this * parameter are {@link FSDirectory} and {@link * org.apache.lucene.index.CompoundFileReader}. */ public IndexInput openInput(String name, int bufferSize) throws IOException { return openInput(name); } /** Construct a {@link Lock}. * @param name the name of the lock file */ public Lock makeLock(String name) { return lockFactory.makeLock(name); } /** * Attempt to clear (forcefully unlock and remove) the * specified lock. Only call this at a time when you are * certain this lock is no longer in use. * @param name name of the lock to be cleared. */ public void clearLock(String name) throws IOException { if (lockFactory != null) { lockFactory.clearLock(name); } } /** Closes the store. */ public abstract void close() throws IOException; /** * Set the LockFactory that this Directory instance should * use for its locking implementation. Each * instance of * LockFactory should only be used for one directory (ie, * do not share a single instance across multiple * Directories). * * @param lockFactory instance of {@link LockFactory}. */ public void setLockFactory(LockFactory lockFactory) { assert lockFactory != null; this.lockFactory = lockFactory; lockFactory.setLockPrefix(this.getLockID()); } /** * Get the LockFactory that this Directory instance is * using for its locking implementation. Note that this * may be null for Directory implementations that provide * their own locking implementation. */ public LockFactory getLockFactory() { return this.lockFactory; } /** * Return a string identifier that uniquely differentiates * this Directory instance from other Directory instances. * This ID should be the same if two Directory instances * (even in different JVMs and/or on different machines) * are considered "the same index". This is how locking * "scopes" to the right index. */ public String getLockID() { return this.toString(); } @Override public String toString() { return super.toString() + " lockFactory=" + getLockFactory(); } /** * Copy contents of a directory src to a directory dest. * If a file in src already exists in dest then the * one in dest will be blindly overwritten. * * <p><b>NOTE:</b> the source directory cannot change * while this method is running. Otherwise the results * are undefined and you could easily hit a * FileNotFoundException. * * <p><b>NOTE:</b> this method only copies files that look * like index files (ie, have extensions matching the * known extensions of index files). * * @param src source directory * @param dest destination directory * @param closeDirSrc if <code>true</code>, call {@link #close()} method on source directory * @throws IOException */ public static void copy(Directory src, Directory dest, boolean closeDirSrc) throws IOException { final String[] files = src.listAll(); IndexFileNameFilter filter = IndexFileNameFilter.getFilter(); byte[] buf = new byte[BufferedIndexOutput.BUFFER_SIZE]; for (int i = 0; i < files.length; i++) { if (!filter.accept(null, files[i])) continue; IndexOutput os = null; IndexInput is = null; try { // create file in dest directory os = dest.createOutput(files[i]); // read current file is = src.openInput(files[i]); // and copy to dest directory long len = is.length(); long readCount = 0; while (readCount < len) { int toRead = readCount + BufferedIndexOutput.BUFFER_SIZE > len ? (int)(len - readCount) : BufferedIndexOutput.BUFFER_SIZE; is.readBytes(buf, 0, toRead); os.writeBytes(buf, toRead); readCount += toRead; } } finally { // graceful cleanup try { if (os != null) os.close(); } finally { if (is != null) is.close(); } } } if(closeDirSrc) src.close(); } /** * @throws AlreadyClosedException if this Directory is closed */ protected final void ensureOpen() throws AlreadyClosedException { if (!isOpen) throw new AlreadyClosedException("this Directory is closed"); } }
1. directory下的索引文件的管理
2. 锁工厂及锁的管理
3. 目录下索引相关的流 IndexInput and IndexOutput
4. sync() commit changes to the index, to prevent a machine/OS crash from corrupting the index.
5. 管理索引文件拷贝
6. lockID:唯一地对应directory实例,见getLockID()
7. 其他类:
lucene.index.IndexFileNameFilter
lucene.store.LockFactory
lucene.store.IndexInput 与lucene.store.IndexOutput
create date : 2011-12-11
update
发表评论
-
【Lucene】建索引核心类介绍
2012-06-08 17:28 1059IndexWriter 负责创建新索引或打开已有索引, ... -
优秀文章汇总
2012-05-08 18:48 768搜索引擎技术之概要预览 http://blog.csd ... -
【Lucene】lucene查询Query对象
2012-05-08 18:41 1414PrefixQuery 前缀查询。 如 test* 会匹配 ... -
【工作】日志检索结果的排序改进分析
2012-04-27 18:07 960下图是现在生产环境的部署图,索引文件分布在70-7 ... -
【Lucene】查询term后加上'*'对打分的影响
2012-04-25 18:14 2093BooleanWeight里sum ... -
lucene.search.Weight
2012-04-25 15:39 992org.apache.lucene.search Cl ... -
lucene.search.Similarity
2012-04-20 10:31 2552Similarity defines the componen ... -
lucene打分机制的研究
2012-04-22 17:46 5861提出问题 目前在查询时,会将得分小于1的查询结果过滤掉。 ... -
tokenize和tokenizer到底怎么翻译?
2012-03-28 10:32 3575在编写词法分析器(Lexer)或语法分析器(Parse ... -
【Lucene】更合理地使用Document和Field
2012-03-27 09:39 5438writer = ...; //#1 Prepared ... -
【Lucene】构建索引
2012-03-17 23:16 756Lucene索引的过程是什么? step1 收集待 ... -
信息检索类小程序
2012-03-17 00:37 8441.对四大名著txt实现索引和搜索功能 2. -
【Lucene】Scoring
2012-03-13 23:47 1167http://lucene.apache.org/core/o ... -
Information Retrieval
2012-03-13 22:50 998http://wiki.apache.org/lucene-j ... -
【Lucene】lucene的评分机制
2012-03-07 16:24 946测试环境里查询条件1065800714,为什么Score ... -
【Lucene】搜索的核心类简介
2012-03-05 18:48 1383注:Lucene版本为3.4 I ... -
【Lucene】How to make indexing faster
2012-02-16 14:54 822http://wiki.apache.org/lucene-j ... -
【Lucene】index包IndexWriter
2011-12-25 01:50 800Q1:IndexWriter作用是什么? Q2:索引过 ... -
【Lucene】store包SimpleFSDirectory
2011-12-24 23:43 803store包SimpleFSDirectory -
【Lucene】store包FSDirectory
2011-12-24 13:39 1435源码中涉及以下知识点: 1.java.security.Me ...
相关推荐
org.apache.lucene.store.Directory public abstract class Analyzer org.apache.lucene.analysis.Analyzer public final class Document org.apache.lucene.document.Document public final class Field org....
import org.apache.lucene.store.Directory; //... 初始化Directory和IndexWriter try (InputStream is = new FileInputStream("path_to_your_file.docx")) { AutoDetectParser parser = new AutoDetectParser();...
import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import java.nio.file.Paths; public class IndexingExample { public static void main(String[] args) throws ...
import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; public class SimpleLuceneDemo { public static void main(String[] args) throws Exception { // 创建目录对象 ...
2. 集成IK分词器:在Lucene中集成IK,首先需要将IKAnalyzer的jar包添加到项目的类路径中。然后,在创建Analyzer时,使用IKAnalyzer类替换默认的Analyzer,这样在索引和搜索过程中就会使用IK进行分词。 三、使用步骤...
import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; public class LuceneIndexExample { public static void main(String[] args) throws Exception { // 创建内存目录 ...
doc.add(new TextField("content", "这里是文档内容", Field.Store.YES)); writer.addDocument(doc); writer.close(); ``` 接下来是搜索部分。Lucene提供了`IndexSearcher`和`QueryParser`来执行查询: 1. **查询...
索引过程包括分析(Analyze)、建立倒排表(Inverted Index)和存储(Store)。 6. 查询(Query):查询对象表示用户要查找的条件,如关键词、短语或布尔表达式。 7. 搜索器(Searcher):搜索器负责执行查询,...
开源全文搜索工具包Lucene2.9.1的使用。 1. 搭建Lucene的开发环境:在classpath中添加lucene-core-2.9.1.jar包 2. 全文搜索的两个工作: 建立索引文件,搜索索引. 3. Lucene的索引文件逻辑结构 1) 索引(Index)由...
import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import java.io.IOException; public class LuceneExample { public static void main(String[] args) throws ...
import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; public class LuceneIndexer { public static void main(String[] args) throws Exception { Analyzer analyzer = new...
`src/Lucene.Net/Store`和`src/Lucene.Net/Index`目录下的类实现了磁盘存储和索引构建,如`Directory`、`SegmentReader`、`Term`、`Document`等。 2. **搜索**:`src/Lucene.Net/Search`目录包含了搜索相关的类,如`...
import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; public class LuceneDemo1 { public static void main(String[] args) throws Exception { // 创建内存目录,用于存储...
### Lucene 使用教程 ...通过本文介绍的基础操作,读者可以了解到如何创建和搜索索引,以及如何在项目中正确地配置Lucene的依赖包。对于更高级的功能和优化策略,建议进一步阅读Lucene的官方文档和其他相关资源。
Directory dir = new RAMDirectory(); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_24); IndexWriter indexWriter = new IndexWriter(dir, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); ...
可以通过NuGet包管理器安装`Lucene.Net`和`Lucene.Net.Analysis.Standard`等必要的包。 2. **创建索引**:索引是Lucene搜索的核心。我们需要定义一个Analyzer(分析器)来处理文本,如使用StandardAnalyzer进行英文...
doc.add(new TextField("content", "文档内容", Field.Store.YES)); indexWriter.addDocument(doc); indexWriter.commit(); indexWriter.close(); ``` 2. 执行搜索:在搜索阶段,我们创建一个IndexSearcher对象,...
doc.add(new TextField("content", "这是一个关于Lucene的初级示例", Field.Store.YES)); ``` 4. **索引文档**:使用`addDocument()`方法将文档添加到索引。 ```java indexWriter.addDocument(doc); ``` 5. **...
3. **创建文档**:定义Document对象,添加Field,如`doc.add(new TextField("content", "文本内容", Store.YES))`。 4. **添加文档到索引**:使用`indexWriter.addDocument(doc)`将文档写入索引。 5. **关闭...