大致的事情就是读取segment.gen文件,从这个文件里面的数据找出segments_x(x是下一个段的名字,是一个36进制的数)这个文件,读segments_x这个文件,因为segment_x记录的索引的segment的元数据信息,读取segment信息后,在分别初始化每个segment的reader对象SegmentReader,SegmentReader会利用内部类CoreReaders,来完成文件的打开和读取,
1 构建FieldInfos,会通过SimpleFSIndexInput对象读取_x.fnm的信息到内存里面,放在list和map里面,map是为了可以利用field的name查找filedInfo信息
2 构建TermInfosReader对象,这个对象负责读取tii文件到内存里面,和打开tis文件,tii文件,是tis文件的第0层skiplist,会全部加载到list里面,由于存储的时候term是有序的,索引查找的时候先用二分查找,查到一个合适的term,这个合适的term是小于或者等于要查找的term,在seek到tis文件相应的位置,进行比较。
3 构建FieldsReader 对象,打开fdx和fdt文件
4 如果有删除文件,打开_x_n.del,由于索引文件是不能修改的,如果要对segment进行删除操作,为每个segment,创建一个_x_n.del 。
5 打开_x.nrm
顺序是
IndeSearcherIndexReader DirectoryReaderSegmentReaderCoreReaders
具体的代码实现及其说明
IndexSearcher indexSearcher=new IndexSearcher(FSDirectory.open(file));
代码 IndexSearcher 的构造函数 调用IndexReader.open(path, true)构造IndexReader最终是构造ReadOnlyDirectoryReader对象
public IndexSearcher(Directory path) throws CorruptIndexException, IOException {
// 初始化IndexReader
this(IndexReader.open(path, true), true);
}
IndexReader.open(path, true) 的代码
public static IndexReader open(final Directory directory, boolean readOnly) throws CorruptIndexException, IOException {
return open(directory, null, null, readOnly, DEFAULT_TERMS_INDEX_DIVISOR);
}
Open 具体的代码调用DirectoryReader 的open方法
private static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly, int termInfosIndexDivisor) throws CorruptIndexException, IOException {
return DirectoryReader.open(directory, deletionPolicy, commit, readOnly, termInfosIndexDivisor);
}
DirectoryReader.open()的主要是构建SegmentInfos.FindSegmentsFile对象调用该对象的run方法,
代码如下
static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly,
final int termInfosIndexDivisor) throws CorruptIndexException, IOException {
return (IndexReader) new SegmentInfos.FindSegmentsFile(directory) {
@Override
protected Object doBody(String segmentFileName) throws CorruptIndexException, IOException {
SegmentInfos infos = new SegmentInfos();
infos.read(directory, segmentFileName);
if (readOnly)
return new ReadOnlyDirectoryReader(directory, infos, deletionPolicy, termInfosIndexDivisor);
else
return new DirectoryReader(directory, infos, deletionPolicy, false, termInfosIndexDivisor);
}
}.run(commit);
}
Run 方法首先会计算segment_x文件的文件的名字,然后调用doBody方法,创建ReadOnlyDirectoryReader对象
Run 方法 的代码如下
public Object run(IndexCommit commit) throws CorruptIndexException, IOException {
if (commit != null) {
if (directory != commit.getDirectory())
throw new IOException("the specified commit does not match the specified Directory");
return doBody(commit.getSegmentsFileName());
}
String segmentFileName = null;
long lastGen = -1;
long gen = 0;
int genLookaheadCount = 0;
IOException exc = null;
boolean retry = false;
int method = 0;
while(true) {
if (0 == method) {
// Method 1: list the directory and use the highest
// segments_N file. This method works well as long
// as there is no stale caching on the directory
// contents (NOTE: NFS clients often have such stale
// caching):
String[] files = null;
long genA = -1;
files = directory.listAll();
if (files != null)
genA = getCurrentSegmentGeneration(files);
message("directory listing genA=" + genA);
long genB = -1;
for(int i=0;i<defaultGenFileRetryCount;i++) {
IndexInput genInput = null;
try {
genInput = directory.openInput(IndexFileNames.SEGMENTS_GEN);
} catch (FileNotFoundException e) {
message("segments.gen open: FileNotFoundException " + e);
break;
} catch (IOException e) {
message("segments.gen open: IOException " + e);
}
if (genInput != null) {
try {
int version = genInput.readInt();
if (version == FORMAT_LOCKLESS) {
long gen0 = genInput.readLong();
long gen1 = genInput.readLong();
message("fallback check: " + gen0 + "; " + gen1);
if (gen0 == gen1) {
// The file is consistent.
genB = gen0;
break;
}
}
} catch (IOException err2) {
// will retry
} finally {
genInput.close();
}
}
try {
Thread.sleep(defaultGenFileRetryPauseMsec);
} catch (InterruptedException ie) {
// In 3.0 we will change this to throw
// InterruptedException instead
Thread.currentThread().interrupt();
throw new RuntimeException(ie);
}
}
message(IndexFileNames.SEGMENTS_GEN + " check: genB=" + genB);
// Pick the larger of the two gen's:
if (genA > genB)
gen = genA;
else
gen = genB;
if (gen == -1) {
// Neither approach found a generation
String s;
if (files != null) {
s = "";
for(int i=0;i<files.length;i++)
s += " " + files[i];
} else
s = " null";
throw new FileNotFoundException("no segments* file found in " + directory + ": files:" + s);
}
}
// Third method (fallback if first & second methods
// are not reliable): since both directory cache and
// file contents cache seem to be stale, just
// advance the generation.
if (1 == method || (0 == method && lastGen == gen && retry)) {
method = 1;
if (genLookaheadCount < defaultGenLookaheadCount) {
gen++;
genLookaheadCount++;
message("look ahead increment gen to " + gen);
}
}
if (lastGen == gen) {
if (retry) {
throw exc;
} else {
retry = true;
}
} else if (0 == method) {
// Segment file has advanced since our last loop, so
// reset retry:
retry = false;
}
lastGen = gen;
// 生成segment_x的文件
segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, "",gen);
调用重写的SegmentInfos.FindSegmentsFile的doBody 方法,返回
ReadOnlyDirectoryReader。
try {
Object v = doBody(segmentFileName);
if (exc != null) {
message("success on " + segmentFileName);
}
return v;
} catch (IOException err) {
// Save the original root cause:
if (exc == null) {
exc = err;
}
message("primary Exception on '" + segmentFileName + "': " + err + "'; will retry: retry=" + retry + "; gen = " + gen);
if (!retry && gen > 1) {
String prevSegmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
gen-1);
final boolean prevExists;
prevExists = directory.fileExists(prevSegmentFileName);
if (prevExists) {
message("fallback to prior segment file '" + prevSegmentFileName + "'");
try {
Object v = doBody(prevSegmentFileName);
if (exc != null) {
message("success on fallback " + prevSegmentFileName);
}
return v;
} catch (IOException err2) {
message("secondary Exception on '" + prevSegmentFileName + "': " + err2 + "'; will retry");
}
}
}
}
}
}
getCurrentSegmentGeneration遍历当前目录下的文件名,找到segment_x文件,返回当前的sement._x的x的值
/**
* Get the generation (N) of the current segments_N file
* from a list of files.
*
* @param files -- array of file names to check
*/
public static long getCurrentSegmentGeneration(String[] files) {
if (files == null) {
return -1;
}
long max = -1;
for (int i = 0; i < files.length; i++) {
String file = files[i];
if (file.startsWith(IndexFileNames.SEGMENTS) && !file.equals(IndexFileNames.SEGMENTS_GEN)) {
long gen = generationFromSegmentsFileName(file);
if (gen > max) {
max = gen;
}
}
}
return max;
}
SimpleFSDirectory的openInput方法创建SimpleFSIndexInput 对象,这个对象是通过一次读取byte[] 数组长度的byte数据,外面接口访问数据是访问byte[],如果byte[]数据中的数据不够会重新再读取一次文件,
/** Creates an IndexInput for the file with the given name. */
@Override
public IndexInput openInput(String name, int bufferSize) throws IOException {
ensureOpen();
return new SimpleFSIndexInput(new File(directory, name), bufferSize, getReadChunkSize());
}
public SimpleFSIndexInput(File path, int bufferSize, int chunkSize) throws IOException {
super(bufferSize);
file = new Descriptor(path, "r");
this.chunkSize = chunkSize;
}
Descriptor 继承RandomAccessFile,这样就可以调用RandomAccessFile 的方法随机的访问文件
protected static class Descriptor extends RandomAccessFile {
// remember if the file is open, so that we don't try to close it
// more than once
protected volatile boolean isOpen;
long position;
final long length;
public Descriptor(File file, String mode) throws IOException {
super(file, mode);
isOpen=true;
length=length();
}
public void close() throws IOException {
if (isOpen) {
isOpen=false;
super.close();
}
}
}
readInt() 是通过读取四个byte的拼成一个int数据
public int readInt() throws IOException {
return ((readByte() & 0xFF) << 24) | ((readByte() & 0xFF) << 16)
| ((readByte() & 0xFF) << | (readByte() & 0xFF);
}
readByte()方法中是根据bufferPosition是记录当前的缓存的byte[] 中当前位置
bufferLength 是byte[] 的length ,如果bufferPosition> bufferLength,会从文件中重新读取到byte[]数组,通过refill 方法实现
@Override
public byte readByte() throws IOException {
if (bufferPosition >= bufferLength)
refill();
return buffer[bufferPosition++];
}
readLong 是通过读取二个int拼成的
public long readLong() throws IOException {
return (((long)readInt()) << 32) | (readInt() & 0xFFFFFFFFL);
}
最终是调用SegmentInfos read 方法完成SegmentInfos 的初始化SegmentInfos继承了Vector,里面保存SegmentInfo,每个Segment 被抽象成SegmentInfo对象,
文件读取的过程是,先读取索引格式的版本号,索引的版本号,下一个segment的名字,读取segmentcount, input.readInt(),循环segmentcount,构建SegmentInfo
public final class SegmentInfos extends Vector<SegmentInfo>
public final void read(Directory directory, String segmentFileName) throws CorruptIndexException, IOException {
boolean success = false;
// Clear any previous segments:
clear();
ChecksumIndexInput input = new ChecksumIndexInput(directory.openInput(segmentFileName));
generation = generationFromSegmentsFileName(segmentFileName);
lastGeneration = generation;
try {
int format = input.readInt();
if(format < 0){ // file contains explicit format info
// check that it is a format we can understand
if (format < CURRENT_FORMAT)
throw new CorruptIndexException("Unknown format version: " + format);
version = input.readLong(); // read version
counter = input.readInt(); // read counter
}
else{ // file is in old format without explicit format info
counter = format;
}
for (int i = input.readInt(); i > 0; i--) { // read segmentInfos
add(new SegmentInfo(directory, format, input));
}
if(format >= 0){ // in old format the version number may be at the end of the file
if (input.getFilePointer() >= input.length())
version = System.currentTimeMillis(); // old file format without version number
else
version = input.readLong(); // read version
}
if (format <= FORMAT_USER_DATA) {
if (format <= FORMAT_DIAGNOSTICS) {
userData = input.readStringStringMap();
} else if (0 != input.readByte()) {
userData = Collections.singletonMap("userData", input.readString());
} else {
userData = Collections.<String,String>emptyMap();
}
} else {
userData = Collections.<String,String>emptyMap();
}
if (format <= FORMAT_CHECKSUM) {
final long checksumNow = input.getChecksum();
final long checksumThen = input.readLong();
if (checksumNow != checksumThen)
throw new CorruptIndexException("checksum mismatch in segments file");
}
success = true;
}
finally {
input.close();
if (!success) {
// Clear any segment infos we had loaded so we
// have a clean slate on retry:
clear();
}
}
}
doboy方法里面的调用DirectoryReader的构造函数。这个方法里面会调用SegmentReader.get(readOnly, sis.info(i), termInfosIndexDivisor);
为每个segment创建SegmentReader对象
/** Construct reading the named set of readers. */
DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor) throws IOException {
this.directory = directory;
this.readOnly = readOnly;
this.segmentInfos = sis;
this.deletionPolicy = deletionPolicy;
this.termInfosIndexDivisor = termInfosIndexDivisor;
if (!readOnly) {
// We assume that this segments_N was previously
// properly sync'd:
synced.addAll(sis.files(directory, true));
}
// To reduce the chance of hitting FileNotFound
// (and having to retry), we open segments in
// reverse because IndexWriter merges & deletes
// the newest segments first.
SegmentReader[] readers = new SegmentReader[sis.size()];
for (int i = sis.size()-1; i >= 0; i--) {
boolean success = false;
try {
readers[i] = SegmentReader.get(readOnly, sis.info(i), termInfosIndexDivisor);
success = true;
} finally {
if (!success) {
// Close all readers we had opened:
for(i++;i<sis.size();i++) {
try {
readers[i].close();
} catch (Throwable ignore) {
// keep going - we want to clean up as much as possible
}
}
}
}
}
initialize(readers);
}
/**
* @throws CorruptIndexException if the index is corrupt
* @throws IOException if there is a low-level IO error
*/
public static SegmentReader get(boolean readOnly, SegmentInfo si, int termInfosIndexDivisor) throws CorruptIndexException, IOException {
return get(readOnly, si.dir, si, BufferedIndexInput.BUFFER_SIZE, true, termInfosIndexDivisor);
}
Get方法会创建ReadOnlySegmentReader 对象,然后调用CoreReaders的构造函数,创建CoreReaders对象。用CoreReaders对象打开正向信息fdx和fdt文件,Fdx文件是fdt的索引文件,打开删除文件_x_n.del文件和_x.nrm文件
/**
* @throws CorruptIndexException if the index is corrupt
* @throws IOException if there is a low-level IO error
*/
public static SegmentReader get(boolean readOnly,
Directory dir,
SegmentInfo si,
int readBufferSize,
boolean doOpenStores,
int termInfosIndexDivisor)
throws CorruptIndexException, IOException {
SegmentReader instance = readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
instance.readOnly = readOnly;
instance.si = si;
instance.readBufferSize = readBufferSize;
boolean success = false;
try {
instance.core = new CoreReaders(dir, si, readBufferSize, termInfosIndexDivisor);
if (doOpenStores) {
instance.core.openDocStores(si);
}
instance.loadDeletedDocs();
instance.openNorms(instance.core.cfsDir, readBufferSize);
success = true;
} finally {
// With lock-less commits, it's entirely possible (and
// fine) to hit a FileNotFound exception above. In
// this case, we want to explicitly close any subset
// of things that were opened so that we don't have to
// wait for a GC to do so.
if (!success) {
instance.doClose();
}
}
return instance;
}
CoreReaders 会读取构造FieldInfos 对象,这个对象保存每个filed的信息也就是每个segment的_x.fnm 文件的信息,构建TermInfosReader对象,TermInfosReader会把tii文件里面的内容加载到内存里面,然后打开tis的文件,打开frg文件和prx文件。
CoreReaders(Directory dir, SegmentInfo si, int readBufferSize, int termsIndexDivisor) throws IOException {
segment = si.name;
this.readBufferSize = readBufferSize;
this.dir = dir;
boolean success = false;
try {
Directory dir0 = dir;
if (si.getUseCompoundFile()) {
cfsReader = new CompoundFileReader(dir, segment + "." + IndexFileNames.COMPOUND_FILE_EXTENSION, readBufferSize);
dir0 = cfsReader;
}
cfsDir = dir0;
fieldInfos = new FieldInfos(cfsDir, segment + "." + IndexFileNames.FIELD_INFOS_EXTENSION);
this.termsIndexDivisor = termsIndexDivisor;
TermInfosReader reader = new TermInfosReader(cfsDir, segment, fieldInfos, readBufferSize, termsIndexDivisor);
if (termsIndexDivisor == -1) {
tisNoIndex = reader;
} else {
tis = reader;
tisNoIndex = null;
}
// make sure that all index files have been read or are kept open
// so that if an index update removes them we'll still have them
freqStream = cfsDir.openInput(segment + "." + IndexFileNames.FREQ_EXTENSION, readBufferSize);
if (fieldInfos.hasProx()) {
proxStream = cfsDir.openInput(segment + "." + IndexFileNames.PROX_EXTENSION, readBufferSize);
} else {
proxStream = null;
}
success = true;
} finally {
if (!success) {
decRef();
}
}
}
读取_x.fnm 文件加载FieldInfo 的信息
FieldInfos(Directory d, String name) throws IOException {
IndexInput input = d.openInput(name);
try {
try {
read(input, name);
} catch (IOException ioe) {
if (format == FORMAT_PRE) {
// LUCENE-1623: FORMAT_PRE (before there was a
// format) may be 2.3.2 (pre-utf8) or 2.4.x (utf8)
// encoding; retry with input set to pre-utf8
input.seek(0);
input.setModifiedUTF8StringsMode();
byNumber.clear();
byName.clear();
try {
read(input, name);
} catch (Throwable t) {
// Ignore any new exception & throw original IOE
throw ioe;
}
} else {
// The IOException cannot be caused by
// LUCENE-1623, so re-throw it
throw ioe;
}
}
} finally {
input.close();
}
}
1 构建FieldInfos,会通过SimpleFSIndexInput对象读取_x.fnm的信息到内存里面,放在list和map里面,map是为了可以利用field的name查找filedInfo信息
2 构建TermInfosReader对象,这个对象负责读取tii文件到内存里面,和打开tis文件,tii文件,是tis文件的第0层skiplist,会全部加载到list里面,由于存储的时候term是有序的,索引查找的时候先用二分查找,查到一个合适的term,这个合适的term是小于或者等于要查找的term,在seek到tis文件相应的位置,进行比较。
3 构建FieldsReader 对象,打开fdx和fdt文件
4 如果有删除文件,打开_x_n.del,由于索引文件是不能修改的,如果要对segment进行删除操作,为每个segment,创建一个_x_n.del 。
5 打开_x.nrm
顺序是
IndeSearcherIndexReader DirectoryReaderSegmentReaderCoreReaders
具体的代码实现及其说明
IndexSearcher indexSearcher=new IndexSearcher(FSDirectory.open(file));
代码 IndexSearcher 的构造函数 调用IndexReader.open(path, true)构造IndexReader最终是构造ReadOnlyDirectoryReader对象
public IndexSearcher(Directory path) throws CorruptIndexException, IOException {
// 初始化IndexReader
this(IndexReader.open(path, true), true);
}
IndexReader.open(path, true) 的代码
public static IndexReader open(final Directory directory, boolean readOnly) throws CorruptIndexException, IOException {
return open(directory, null, null, readOnly, DEFAULT_TERMS_INDEX_DIVISOR);
}
Open 具体的代码调用DirectoryReader 的open方法
private static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly, int termInfosIndexDivisor) throws CorruptIndexException, IOException {
return DirectoryReader.open(directory, deletionPolicy, commit, readOnly, termInfosIndexDivisor);
}
DirectoryReader.open()的主要是构建SegmentInfos.FindSegmentsFile对象调用该对象的run方法,
代码如下
static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly,
final int termInfosIndexDivisor) throws CorruptIndexException, IOException {
return (IndexReader) new SegmentInfos.FindSegmentsFile(directory) {
@Override
protected Object doBody(String segmentFileName) throws CorruptIndexException, IOException {
SegmentInfos infos = new SegmentInfos();
infos.read(directory, segmentFileName);
if (readOnly)
return new ReadOnlyDirectoryReader(directory, infos, deletionPolicy, termInfosIndexDivisor);
else
return new DirectoryReader(directory, infos, deletionPolicy, false, termInfosIndexDivisor);
}
}.run(commit);
}
Run 方法首先会计算segment_x文件的文件的名字,然后调用doBody方法,创建ReadOnlyDirectoryReader对象
Run 方法 的代码如下
public Object run(IndexCommit commit) throws CorruptIndexException, IOException {
if (commit != null) {
if (directory != commit.getDirectory())
throw new IOException("the specified commit does not match the specified Directory");
return doBody(commit.getSegmentsFileName());
}
String segmentFileName = null;
long lastGen = -1;
long gen = 0;
int genLookaheadCount = 0;
IOException exc = null;
boolean retry = false;
int method = 0;
while(true) {
if (0 == method) {
// Method 1: list the directory and use the highest
// segments_N file. This method works well as long
// as there is no stale caching on the directory
// contents (NOTE: NFS clients often have such stale
// caching):
String[] files = null;
long genA = -1;
files = directory.listAll();
if (files != null)
genA = getCurrentSegmentGeneration(files);
message("directory listing genA=" + genA);
long genB = -1;
for(int i=0;i<defaultGenFileRetryCount;i++) {
IndexInput genInput = null;
try {
genInput = directory.openInput(IndexFileNames.SEGMENTS_GEN);
} catch (FileNotFoundException e) {
message("segments.gen open: FileNotFoundException " + e);
break;
} catch (IOException e) {
message("segments.gen open: IOException " + e);
}
if (genInput != null) {
try {
int version = genInput.readInt();
if (version == FORMAT_LOCKLESS) {
long gen0 = genInput.readLong();
long gen1 = genInput.readLong();
message("fallback check: " + gen0 + "; " + gen1);
if (gen0 == gen1) {
// The file is consistent.
genB = gen0;
break;
}
}
} catch (IOException err2) {
// will retry
} finally {
genInput.close();
}
}
try {
Thread.sleep(defaultGenFileRetryPauseMsec);
} catch (InterruptedException ie) {
// In 3.0 we will change this to throw
// InterruptedException instead
Thread.currentThread().interrupt();
throw new RuntimeException(ie);
}
}
message(IndexFileNames.SEGMENTS_GEN + " check: genB=" + genB);
// Pick the larger of the two gen's:
if (genA > genB)
gen = genA;
else
gen = genB;
if (gen == -1) {
// Neither approach found a generation
String s;
if (files != null) {
s = "";
for(int i=0;i<files.length;i++)
s += " " + files[i];
} else
s = " null";
throw new FileNotFoundException("no segments* file found in " + directory + ": files:" + s);
}
}
// Third method (fallback if first & second methods
// are not reliable): since both directory cache and
// file contents cache seem to be stale, just
// advance the generation.
if (1 == method || (0 == method && lastGen == gen && retry)) {
method = 1;
if (genLookaheadCount < defaultGenLookaheadCount) {
gen++;
genLookaheadCount++;
message("look ahead increment gen to " + gen);
}
}
if (lastGen == gen) {
if (retry) {
throw exc;
} else {
retry = true;
}
} else if (0 == method) {
// Segment file has advanced since our last loop, so
// reset retry:
retry = false;
}
lastGen = gen;
// 生成segment_x的文件
segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, "",gen);
调用重写的SegmentInfos.FindSegmentsFile的doBody 方法,返回
ReadOnlyDirectoryReader。
try {
Object v = doBody(segmentFileName);
if (exc != null) {
message("success on " + segmentFileName);
}
return v;
} catch (IOException err) {
// Save the original root cause:
if (exc == null) {
exc = err;
}
message("primary Exception on '" + segmentFileName + "': " + err + "'; will retry: retry=" + retry + "; gen = " + gen);
if (!retry && gen > 1) {
String prevSegmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
gen-1);
final boolean prevExists;
prevExists = directory.fileExists(prevSegmentFileName);
if (prevExists) {
message("fallback to prior segment file '" + prevSegmentFileName + "'");
try {
Object v = doBody(prevSegmentFileName);
if (exc != null) {
message("success on fallback " + prevSegmentFileName);
}
return v;
} catch (IOException err2) {
message("secondary Exception on '" + prevSegmentFileName + "': " + err2 + "'; will retry");
}
}
}
}
}
}
getCurrentSegmentGeneration遍历当前目录下的文件名,找到segment_x文件,返回当前的sement._x的x的值
/**
* Get the generation (N) of the current segments_N file
* from a list of files.
*
* @param files -- array of file names to check
*/
public static long getCurrentSegmentGeneration(String[] files) {
if (files == null) {
return -1;
}
long max = -1;
for (int i = 0; i < files.length; i++) {
String file = files[i];
if (file.startsWith(IndexFileNames.SEGMENTS) && !file.equals(IndexFileNames.SEGMENTS_GEN)) {
long gen = generationFromSegmentsFileName(file);
if (gen > max) {
max = gen;
}
}
}
return max;
}
SimpleFSDirectory的openInput方法创建SimpleFSIndexInput 对象,这个对象是通过一次读取byte[] 数组长度的byte数据,外面接口访问数据是访问byte[],如果byte[]数据中的数据不够会重新再读取一次文件,
/** Creates an IndexInput for the file with the given name. */
@Override
public IndexInput openInput(String name, int bufferSize) throws IOException {
ensureOpen();
return new SimpleFSIndexInput(new File(directory, name), bufferSize, getReadChunkSize());
}
public SimpleFSIndexInput(File path, int bufferSize, int chunkSize) throws IOException {
super(bufferSize);
file = new Descriptor(path, "r");
this.chunkSize = chunkSize;
}
Descriptor 继承RandomAccessFile,这样就可以调用RandomAccessFile 的方法随机的访问文件
protected static class Descriptor extends RandomAccessFile {
// remember if the file is open, so that we don't try to close it
// more than once
protected volatile boolean isOpen;
long position;
final long length;
public Descriptor(File file, String mode) throws IOException {
super(file, mode);
isOpen=true;
length=length();
}
public void close() throws IOException {
if (isOpen) {
isOpen=false;
super.close();
}
}
}
readInt() 是通过读取四个byte的拼成一个int数据
public int readInt() throws IOException {
return ((readByte() & 0xFF) << 24) | ((readByte() & 0xFF) << 16)
| ((readByte() & 0xFF) << | (readByte() & 0xFF);
}
readByte()方法中是根据bufferPosition是记录当前的缓存的byte[] 中当前位置
bufferLength 是byte[] 的length ,如果bufferPosition> bufferLength,会从文件中重新读取到byte[]数组,通过refill 方法实现
@Override
public byte readByte() throws IOException {
if (bufferPosition >= bufferLength)
refill();
return buffer[bufferPosition++];
}
readLong 是通过读取二个int拼成的
public long readLong() throws IOException {
return (((long)readInt()) << 32) | (readInt() & 0xFFFFFFFFL);
}
最终是调用SegmentInfos read 方法完成SegmentInfos 的初始化SegmentInfos继承了Vector,里面保存SegmentInfo,每个Segment 被抽象成SegmentInfo对象,
文件读取的过程是,先读取索引格式的版本号,索引的版本号,下一个segment的名字,读取segmentcount, input.readInt(),循环segmentcount,构建SegmentInfo
public final class SegmentInfos extends Vector<SegmentInfo>
public final void read(Directory directory, String segmentFileName) throws CorruptIndexException, IOException {
boolean success = false;
// Clear any previous segments:
clear();
ChecksumIndexInput input = new ChecksumIndexInput(directory.openInput(segmentFileName));
generation = generationFromSegmentsFileName(segmentFileName);
lastGeneration = generation;
try {
int format = input.readInt();
if(format < 0){ // file contains explicit format info
// check that it is a format we can understand
if (format < CURRENT_FORMAT)
throw new CorruptIndexException("Unknown format version: " + format);
version = input.readLong(); // read version
counter = input.readInt(); // read counter
}
else{ // file is in old format without explicit format info
counter = format;
}
for (int i = input.readInt(); i > 0; i--) { // read segmentInfos
add(new SegmentInfo(directory, format, input));
}
if(format >= 0){ // in old format the version number may be at the end of the file
if (input.getFilePointer() >= input.length())
version = System.currentTimeMillis(); // old file format without version number
else
version = input.readLong(); // read version
}
if (format <= FORMAT_USER_DATA) {
if (format <= FORMAT_DIAGNOSTICS) {
userData = input.readStringStringMap();
} else if (0 != input.readByte()) {
userData = Collections.singletonMap("userData", input.readString());
} else {
userData = Collections.<String,String>emptyMap();
}
} else {
userData = Collections.<String,String>emptyMap();
}
if (format <= FORMAT_CHECKSUM) {
final long checksumNow = input.getChecksum();
final long checksumThen = input.readLong();
if (checksumNow != checksumThen)
throw new CorruptIndexException("checksum mismatch in segments file");
}
success = true;
}
finally {
input.close();
if (!success) {
// Clear any segment infos we had loaded so we
// have a clean slate on retry:
clear();
}
}
}
doboy方法里面的调用DirectoryReader的构造函数。这个方法里面会调用SegmentReader.get(readOnly, sis.info(i), termInfosIndexDivisor);
为每个segment创建SegmentReader对象
/** Construct reading the named set of readers. */
DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor) throws IOException {
this.directory = directory;
this.readOnly = readOnly;
this.segmentInfos = sis;
this.deletionPolicy = deletionPolicy;
this.termInfosIndexDivisor = termInfosIndexDivisor;
if (!readOnly) {
// We assume that this segments_N was previously
// properly sync'd:
synced.addAll(sis.files(directory, true));
}
// To reduce the chance of hitting FileNotFound
// (and having to retry), we open segments in
// reverse because IndexWriter merges & deletes
// the newest segments first.
SegmentReader[] readers = new SegmentReader[sis.size()];
for (int i = sis.size()-1; i >= 0; i--) {
boolean success = false;
try {
readers[i] = SegmentReader.get(readOnly, sis.info(i), termInfosIndexDivisor);
success = true;
} finally {
if (!success) {
// Close all readers we had opened:
for(i++;i<sis.size();i++) {
try {
readers[i].close();
} catch (Throwable ignore) {
// keep going - we want to clean up as much as possible
}
}
}
}
}
initialize(readers);
}
/**
* @throws CorruptIndexException if the index is corrupt
* @throws IOException if there is a low-level IO error
*/
public static SegmentReader get(boolean readOnly, SegmentInfo si, int termInfosIndexDivisor) throws CorruptIndexException, IOException {
return get(readOnly, si.dir, si, BufferedIndexInput.BUFFER_SIZE, true, termInfosIndexDivisor);
}
Get方法会创建ReadOnlySegmentReader 对象,然后调用CoreReaders的构造函数,创建CoreReaders对象。用CoreReaders对象打开正向信息fdx和fdt文件,Fdx文件是fdt的索引文件,打开删除文件_x_n.del文件和_x.nrm文件
/**
* @throws CorruptIndexException if the index is corrupt
* @throws IOException if there is a low-level IO error
*/
public static SegmentReader get(boolean readOnly,
Directory dir,
SegmentInfo si,
int readBufferSize,
boolean doOpenStores,
int termInfosIndexDivisor)
throws CorruptIndexException, IOException {
SegmentReader instance = readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
instance.readOnly = readOnly;
instance.si = si;
instance.readBufferSize = readBufferSize;
boolean success = false;
try {
instance.core = new CoreReaders(dir, si, readBufferSize, termInfosIndexDivisor);
if (doOpenStores) {
instance.core.openDocStores(si);
}
instance.loadDeletedDocs();
instance.openNorms(instance.core.cfsDir, readBufferSize);
success = true;
} finally {
// With lock-less commits, it's entirely possible (and
// fine) to hit a FileNotFound exception above. In
// this case, we want to explicitly close any subset
// of things that were opened so that we don't have to
// wait for a GC to do so.
if (!success) {
instance.doClose();
}
}
return instance;
}
CoreReaders 会读取构造FieldInfos 对象,这个对象保存每个filed的信息也就是每个segment的_x.fnm 文件的信息,构建TermInfosReader对象,TermInfosReader会把tii文件里面的内容加载到内存里面,然后打开tis的文件,打开frg文件和prx文件。
CoreReaders(Directory dir, SegmentInfo si, int readBufferSize, int termsIndexDivisor) throws IOException {
segment = si.name;
this.readBufferSize = readBufferSize;
this.dir = dir;
boolean success = false;
try {
Directory dir0 = dir;
if (si.getUseCompoundFile()) {
cfsReader = new CompoundFileReader(dir, segment + "." + IndexFileNames.COMPOUND_FILE_EXTENSION, readBufferSize);
dir0 = cfsReader;
}
cfsDir = dir0;
fieldInfos = new FieldInfos(cfsDir, segment + "." + IndexFileNames.FIELD_INFOS_EXTENSION);
this.termsIndexDivisor = termsIndexDivisor;
TermInfosReader reader = new TermInfosReader(cfsDir, segment, fieldInfos, readBufferSize, termsIndexDivisor);
if (termsIndexDivisor == -1) {
tisNoIndex = reader;
} else {
tis = reader;
tisNoIndex = null;
}
// make sure that all index files have been read or are kept open
// so that if an index update removes them we'll still have them
freqStream = cfsDir.openInput(segment + "." + IndexFileNames.FREQ_EXTENSION, readBufferSize);
if (fieldInfos.hasProx()) {
proxStream = cfsDir.openInput(segment + "." + IndexFileNames.PROX_EXTENSION, readBufferSize);
} else {
proxStream = null;
}
success = true;
} finally {
if (!success) {
decRef();
}
}
}
读取_x.fnm 文件加载FieldInfo 的信息
FieldInfos(Directory d, String name) throws IOException {
IndexInput input = d.openInput(name);
try {
try {
read(input, name);
} catch (IOException ioe) {
if (format == FORMAT_PRE) {
// LUCENE-1623: FORMAT_PRE (before there was a
// format) may be 2.3.2 (pre-utf8) or 2.4.x (utf8)
// encoding; retry with input set to pre-utf8
input.seek(0);
input.setModifiedUTF8StringsMode();
byNumber.clear();
byName.clear();
try {
read(input, name);
} catch (Throwable t) {
// Ignore any new exception & throw original IOE
throw ioe;
}
} else {
// The IOException cannot be caused by
// LUCENE-1623, so re-throw it
throw ioe;
}
}
} finally {
input.close();
}
}
发表评论
-
OpenBitSet和OpenBitSetIterator在TermRangeQuery中的运用
2010-11-16 16:06 1889OpenBitSet和OpenBitSetIterator在T ... -
OpenBitSet和OpenBitSetIterator
2010-11-11 15:18 1690OpenBitSet和OpenBitSetIterator ... -
多个term查询的步骤
2010-09-13 15:15 996多个term查询的步骤 分 ... -
lucene的排序和缓存的应用
2010-09-13 15:00 2575Lucene的排序是通过FieldComparator及其子类 ... -
DefaultSkipListReader查找docId
2010-09-02 15:33 971DefaultSkipListReader查找docI ... -
DocFieldProcessorPerField 创建的过程序列图
2010-08-26 15:19 911document的写入是通过DocFieldProcessor ... -
DocumentsWriterThreadState 创建过程序列图
2010-08-26 15:04 1265摘自org.apache.lucene.index.Docum ... -
lucene indexwriter的相关类图
2010-08-26 10:16 1343最近在学习lucene,看的比较迷糊,所以 就把类图画了
相关推荐
一步一步跟我学习lucene是对近期做lucene索引的总结,大家有问题的话联系本人的Q-Q: 891922381,同时本人新建Q-Q群:106570134(lucene,solr,netty,hadoop),如蒙加入,不胜感激,大家共同探讨,本人争取每日一博,...
2. **创建IndexSearcher**:基于IndexReader创建IndexSearcher对象。 3. **构建Query**:使用QueryParser或者直接创建Query对象,如`new TermQuery(new Term("field", "query term"))`。 4. **执行查询**:使用`...
为了使用`IndexSearcher`,我们需要创建一个`Directory`对象,该对象指向包含索引的文件系统位置,然后通过`DirectoryReader`读取索引。 分页查询是Web应用中常见的需求,当结果集庞大时,一次性返回所有结果不仅...
- **创建IndexSearcher**:用于执行搜索操作。 - **创建QueryParser**:用于解析用户输入的查询字符串。 示例代码如下: ```java Directory directory = FSDirectory.open(Paths.get("indexDir")); IndexReader ...
在这个压缩包文件中,包含的源代码着重展示了如何利用Lucene进行索引创建和搜索操作,这些都是Lucene的核心功能。 首先,让我们了解一下Lucene的索引创建过程。在Lucene中,数据被转化为一种便于搜索的结构——倒排...
**Lucene5学习之创建索引入门示例** 在IT领域,搜索引擎的开发与优化是一项关键技术,而Apache Lucene作为一款高性能、全文本搜索库,是许多开发者进行文本检索的首选工具。本文将深入探讨如何使用Lucene5来创建一...
《Lucene 3.5:创建、增删改查详解》 Lucene 是一个高性能、全文本搜索库,被广泛应用于各种搜索引擎的开发。在3.5版本中,Lucene 提供了强大的文本分析和索引功能,以及对文档的高效检索。本文将详细介绍如何在...
在本示例中,我们将探讨 Lucene5 创建索引和执行搜索的基本流程。 1. **安装与设置** - `.classpath` 和 `.project` 文件是 Eclipse IDE 的配置文件,它们包含了项目的类路径和工程设置。为了运行 Lucene 示例,...
1. **创建IndexSearcher**: 使用Directory创建IndexSearcher对象,用于执行查询。 2. **构建Query**: 通过QueryParser或直接使用Query类创建查询对象。对于复杂查询,可以使用BooleanQuery、PrefixQuery等。 3. **...
- **创建IndexSearcher**: 使用`new IndexSearcher(reader)`创建一个IndexSearcher对象,它负责执行实际的搜索。 - **创建Query**: 使用QueryParser解析用户输入的查询字符串,创建一个Query对象,如`Query query =...
2. **创建IndexSearcher对象**:用于执行查询。 3. **创建Query对象**:根据用户输入构建查询,常见的查询类有TermQuery(单个关键词)、BooleanQuery(逻辑组合)等。 4. **执行搜索**:使用IndexSearcher的...
* IndexSearcher需要通过SearcherManager管理,因为IndexSearcher如果初始化的时候加载了索引文件夹,那么后面添加、删除、修改的索引都不能通过IndexSearcher查出来,因为它没有与索引库实时同步。 * 我们需要创建...
01:索引创建的步骤:创建directory创建IndexWriter创建Document为Document添加Field通过IdexUriter添加文档到索引中搜索的步骤:创建directory创建IndexReader根据IndexReader创建IndexSearcher创建Query根据searcher...
1. **打开IndexSearcher**:创建或获取已存在的 `IndexSearcher`,通常在一个应用环境中,为了性能考虑,应该复用 `IndexSearcher` 对象,避免频繁创建。 2. **构造查询**:使用 `QueryParser` 对输入的查询字符串...
3. 创建IndexSearcher:基于IndexReader创建一个IndexSearcher,它负责执行查询。 4. 构建Query:根据用户输入构建Query对象,可以是TermQuery、PhraseQuery、BooleanQuery等。 5. 执行搜索:调用IndexSearcher的...
2. **创建 IndexSearcher**:使用 Directory 创建 IndexSearcher,它负责执行查询并返回结果。 3. **执行查询**:使用 IndexSearcher.search(Query) 方法执行查询,并获取 TopDocs 对象,其中包含了匹配的文档信息...
2. 基于IndexReader创建IndexSearcher对象。 3. 创建Analyzer实例,用于解析查询字符串。 4. 使用Analyzer和查询字符串创建Query对象,如TermQuery、BooleanQuery等。 5. 使用IndexSearcher执行查询并获取TopDocs...
2. **创建IndexSearcher**: 使用IndexReader获取IndexSearcher实例。 3. **执行查询**: 调用IndexSearcher的search方法,传入Query对象。 4. **获取结果**: 返回ScoreDoc数组,表示匹配的文档及其相关性分数。 5. **...
3. 搜索:使用IndexReader打开索引,然后创建IndexSearcher。构造Query对象,执行search方法,返回匹配的ScoreDoc数组。ScoreDoc包含了文档的得分和编号。 4. 结果处理:对ScoreDoc数组进行迭代,使用Document对象...
标题中的“Lucene对本地文件多目录创建索引”指的是使用Apache Lucene库来构建一个搜索引擎,该搜索引擎能够索引本地计算机上的多个文件目录。Lucene是一个强大的全文搜索库,它允许开发者在Java应用程序中实现高级...