创建IndexSearcher的时候到底做了哪些事情

chengqianl

浏览: 53755 次
性别:
来自: 杭州

最近访客更多访客>>

ForLove_ForYOU

阿祥哥

dj78337323

donchiang709

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

lucene

lucene UP

大致的事情就是读取segment.gen文件，从这个文件里面的数据找出segments_x（x是下一个段的名字，是一个36进制的数）这个文件，读segments_x这个文件，因为segment_x记录的索引的segment的元数据信息，读取segment信息后，在分别初始化每个segment的reader对象SegmentReader，SegmentReader会利用内部类CoreReaders，来完成文件的打开和读取，
1 构建FieldInfos，会通过SimpleFSIndexInput对象读取_x.fnm的信息到内存里面，放在list和map里面，map是为了可以利用field的name查找filedInfo信息
2 构建TermInfosReader对象，这个对象负责读取tii文件到内存里面，和打开tis文件，tii文件，是tis文件的第0层skiplist，会全部加载到list里面，由于存储的时候term是有序的，索引查找的时候先用二分查找，查到一个合适的term，这个合适的term是小于或者等于要查找的term，在seek到tis文件相应的位置，进行比较。
3 构建FieldsReader 对象，打开fdx和fdt文件
4 如果有删除文件，打开_x_n.del，由于索引文件是不能修改的，如果要对segment进行删除操作，为每个segment，创建一个_x_n.del 。
5 打开_x.nrm

顺序是
IndeSearcherIndexReader  DirectoryReaderSegmentReaderCoreReaders

具体的代码实现及其说明

IndexSearcher indexSearcher=new IndexSearcher(FSDirectory.open(file));

代码 IndexSearcher 的构造函数调用IndexReader.open(path, true)构造IndexReader最终是构造ReadOnlyDirectoryReader对象
public IndexSearcher(Directory path) throws CorruptIndexException, IOException {
// 初始化IndexReader
    this(IndexReader.open(path, true), true);
}

IndexReader.open(path, true) 的代码
public static IndexReader open(final Directory directory, boolean readOnly) throws CorruptIndexException, IOException {
    return open(directory, null, null, readOnly, DEFAULT_TERMS_INDEX_DIVISOR);
}

Open 具体的代码调用DirectoryReader 的open方法

private static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly, int termInfosIndexDivisor) throws CorruptIndexException, IOException {
    return DirectoryReader.open(directory, deletionPolicy, commit, readOnly, termInfosIndexDivisor);
}

DirectoryReader.open()的主要是构建SegmentInfos.FindSegmentsFile对象调用该对象的run方法，
代码如下

static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly,
                          final int termInfosIndexDivisor) throws CorruptIndexException, IOException {
    return (IndexReader) new SegmentInfos.FindSegmentsFile(directory) {
      @Override
      protected Object doBody(String segmentFileName) throws CorruptIndexException, IOException {
        SegmentInfos infos = new SegmentInfos();
        infos.read(directory, segmentFileName);
        if (readOnly)
          return new ReadOnlyDirectoryReader(directory, infos, deletionPolicy, termInfosIndexDivisor);
        else
          return new DirectoryReader(directory, infos, deletionPolicy, false, termInfosIndexDivisor);
      }
    }.run(commit);
}

Run 方法首先会计算segment_x文件的文件的名字，然后调用doBody方法，创建ReadOnlyDirectoryReader对象
Run 方法的代码如下

public Object run(IndexCommit commit) throws CorruptIndexException, IOException {
      if (commit != null) {
        if (directory != commit.getDirectory())
          throw new IOException("the specified commit does not match the specified Directory");
        return doBody(commit.getSegmentsFileName());
      }

      String segmentFileName = null;
      long lastGen = -1;
      long gen = 0;
      int genLookaheadCount = 0;
      IOException exc = null;
      boolean retry = false;

      int method = 0;

      while(true) {

        if (0 == method) {

          // Method 1: list the directory and use the highest
          // segments_N file. This method works well as long
          // as there is no stale caching on the directory
          // contents (NOTE: NFS clients often have such stale
          // caching):
          String[] files = null;

          long genA = -1;

          files = directory.listAll();

          if (files != null)
            genA = getCurrentSegmentGeneration(files);

          message("directory listing genA=" + genA);


          long genB = -1;
          for(int i=0;i<defaultGenFileRetryCount;i++) {
            IndexInput genInput = null;
            try {
              genInput = directory.openInput(IndexFileNames.SEGMENTS_GEN);
            } catch (FileNotFoundException e) {
              message("segments.gen open: FileNotFoundException " + e);
              break;
            } catch (IOException e) {
              message("segments.gen open: IOException " + e);
            }

            if (genInput != null) {
              try {
                int version = genInput.readInt();
                if (version == FORMAT_LOCKLESS) {
                  long gen0 = genInput.readLong();
                  long gen1 = genInput.readLong();
                  message("fallback check: " + gen0 + "; " + gen1);
                  if (gen0 == gen1) {
                    // The file is consistent.
                    genB = gen0;
                    break;
                  }
                }
              } catch (IOException err2) {
                // will retry
              } finally {
                genInput.close();
              }
            }
            try {
              Thread.sleep(defaultGenFileRetryPauseMsec);
            } catch (InterruptedException ie) {
              // In 3.0 we will change this to throw
              // InterruptedException instead
              Thread.currentThread().interrupt();
              throw new RuntimeException(ie);
            }
          }

          message(IndexFileNames.SEGMENTS_GEN + " check: genB=" + genB);

          // Pick the larger of the two gen's:
          if (genA > genB)
            gen = genA;
          else
            gen = genB;

          if (gen == -1) {
            // Neither approach found a generation
            String s;
            if (files != null) {
              s = "";
              for(int i=0;i<files.length;i++)
                s += " " + files[i];
            } else
              s = " null";
            throw new FileNotFoundException("no segments* file found in " + directory + ": files:" + s);
          }
        }

        // Third method (fallback if first & second methods
        // are not reliable): since both directory cache and
        // file contents cache seem to be stale, just
        // advance the generation.
        if (1 == method || (0 == method && lastGen == gen && retry)) {

          method = 1;

          if (genLookaheadCount < defaultGenLookaheadCount) {
            gen++;
            genLookaheadCount++;
            message("look ahead increment gen to " + gen);
          }
        }

        if (lastGen == gen) {



          if (retry) {

            throw exc;
          } else {
            retry = true;
          }

        } else if (0 == method) {
          // Segment file has advanced since our last loop, so
          // reset retry:
          retry = false;
        }

        lastGen = gen;
// 生成segment_x的文件
        segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,                                                               "",gen);

调用重写的SegmentInfos.FindSegmentsFile的doBody 方法，返回
ReadOnlyDirectoryReader。

        try {
          Object v = doBody(segmentFileName);
          if (exc != null) {
            message("success on " + segmentFileName);
          }
          return v;
        } catch (IOException err) {

          // Save the original root cause:
          if (exc == null) {
            exc = err;
          }

          message("primary Exception on '" + segmentFileName + "': " + err + "'; will retry: retry=" + retry + "; gen = " + gen);

          if (!retry && gen > 1) {


            String prevSegmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
                                                                               "",
                                                                               gen-1);

            final boolean prevExists;
            prevExists = directory.fileExists(prevSegmentFileName);

            if (prevExists) {
              message("fallback to prior segment file '" + prevSegmentFileName + "'");
              try {
                Object v = doBody(prevSegmentFileName);
                if (exc != null) {
                  message("success on fallback " + prevSegmentFileName);
                }
                return v;
              } catch (IOException err2) {
                message("secondary Exception on '" + prevSegmentFileName + "': " + err2 + "'; will retry");
              }
            }
          }
        }
      }
    }

getCurrentSegmentGeneration遍历当前目录下的文件名，找到segment_x文件，返回当前的sement._x的x的值
/**
   * Get the generation (N) of the current segments_N file
   * from a list of files.
   *
   * @param files -- array of file names to check
   */
public static long getCurrentSegmentGeneration(String[] files) {
    if (files == null) {
      return -1;
    }
    long max = -1;
    for (int i = 0; i < files.length; i++) {
      String file = files[i];
      if (file.startsWith(IndexFileNames.SEGMENTS) && !file.equals(IndexFileNames.SEGMENTS_GEN)) {
        long gen = generationFromSegmentsFileName(file);
        if (gen > max) {
          max = gen;
        }
      }
    }
    return max;
}
SimpleFSDirectory的openInput方法创建SimpleFSIndexInput 对象，这个对象是通过一次读取byte[] 数组长度的byte数据，外面接口访问数据是访问byte[]，如果byte[]数据中的数据不够会重新再读取一次文件，
/** Creates an IndexInput for the file with the given name. */
@Override
public IndexInput openInput(String name, int bufferSize) throws IOException {
    ensureOpen();
    return new SimpleFSIndexInput(new File(directory, name), bufferSize, getReadChunkSize());
}

public SimpleFSIndexInput(File path, int bufferSize, int chunkSize) throws IOException {
      super(bufferSize);
      file = new Descriptor(path, "r");
      this.chunkSize = chunkSize;
}

Descriptor 继承RandomAccessFile，这样就可以调用RandomAccessFile 的方法随机的访问文件

    protected static class Descriptor extends RandomAccessFile {
      // remember if the file is open, so that we don't try to close it
      // more than once
      protected volatile boolean isOpen;
      long position;
      final long length;

      public Descriptor(File file, String mode) throws IOException {
        super(file, mode);
        isOpen=true;
        length=length();
      }

      public void close() throws IOException {
        if (isOpen) {
          isOpen=false;
          super.close();
        }
      }
    }

readInt() 是通过读取四个byte的拼成一个int数据

public int readInt() throws IOException {
    return ((readByte() & 0xFF) << 24) | ((readByte() & 0xFF) << 16)
         | ((readByte() & 0xFF) <<

| (readByte() & 0xFF);
}

readByte()方法中是根据bufferPosition是记录当前的缓存的byte[] 中当前位置
bufferLength 是byte[] 的length ，如果bufferPosition> bufferLength,会从文件中重新读取到byte[]数组，通过refill 方法实现
@Override
public byte readByte() throws IOException {
    if (bufferPosition >= bufferLength)
      refill();
    return buffer[bufferPosition++];
}
readLong 是通过读取二个int拼成的
public long readLong() throws IOException {
    return (((long)readInt()) << 32) | (readInt() & 0xFFFFFFFFL);
}

最终是调用SegmentInfos read 方法完成SegmentInfos 的初始化SegmentInfos继承了Vector，里面保存SegmentInfo，每个Segment 被抽象成SegmentInfo对象，
文件读取的过程是，先读取索引格式的版本号，索引的版本号，下一个segment的名字，读取segmentcount, input.readInt(),循环segmentcount，构建SegmentInfo
public final class SegmentInfos extends Vector<SegmentInfo>
public final void read(Directory directory, String segmentFileName) throws CorruptIndexException, IOException {
    boolean success = false;

    // Clear any previous segments:
    clear();

    ChecksumIndexInput input = new ChecksumIndexInput(directory.openInput(segmentFileName));

    generation = generationFromSegmentsFileName(segmentFileName);

    lastGeneration = generation;

    try {
      int format = input.readInt();
      if(format < 0){     // file contains explicit format info
        // check that it is a format we can understand
        if (format < CURRENT_FORMAT)
          throw new CorruptIndexException("Unknown format version: " + format);
        version = input.readLong(); // read version
        counter = input.readInt(); // read counter
      }
      else{     // file is in old format without explicit format info
        counter = format;
      }

      for (int i = input.readInt(); i > 0; i--) { // read segmentInfos
        add(new SegmentInfo(directory, format, input));
      }

      if(format >= 0){    // in old format the version number may be at the end of the file
        if (input.getFilePointer() >= input.length())
          version = System.currentTimeMillis(); // old file format without version number
        else
          version = input.readLong(); // read version
      }

      if (format <= FORMAT_USER_DATA) {
        if (format <= FORMAT_DIAGNOSTICS) {
          userData = input.readStringStringMap();
        } else if (0 != input.readByte()) {
          userData = Collections.singletonMap("userData", input.readString());
        } else {
          userData = Collections.<String,String>emptyMap();
        }
      } else {
        userData = Collections.<String,String>emptyMap();
      }

      if (format <= FORMAT_CHECKSUM) {
        final long checksumNow = input.getChecksum();
        final long checksumThen = input.readLong();
        if (checksumNow != checksumThen)
          throw new CorruptIndexException("checksum mismatch in segments file");
      }
      success = true;
    }
    finally {
      input.close();
      if (!success) {
        // Clear any segment infos we had loaded so we
        // have a clean slate on retry:
        clear();
      }
    }
}

doboy方法里面的调用DirectoryReader的构造函数。这个方法里面会调用SegmentReader.get(readOnly, sis.info(i), termInfosIndexDivisor);
为每个segment创建SegmentReader对象
/** Construct reading the named set of readers. */
DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor) throws IOException {
    this.directory = directory;
    this.readOnly = readOnly;
    this.segmentInfos = sis;
    this.deletionPolicy = deletionPolicy;
    this.termInfosIndexDivisor = termInfosIndexDivisor;

    if (!readOnly) {
      // We assume that this segments_N was previously
      // properly sync'd:
      synced.addAll(sis.files(directory, true));
    }

    // To reduce the chance of hitting FileNotFound
    // (and having to retry), we open segments in
    // reverse because IndexWriter merges & deletes
    // the newest segments first.

    SegmentReader[] readers = new SegmentReader[sis.size()];
    for (int i = sis.size()-1; i >= 0; i--) {
      boolean success = false;
      try {
        readers[i] = SegmentReader.get(readOnly, sis.info(i), termInfosIndexDivisor);
        success = true;
      } finally {
        if (!success) {
          // Close all readers we had opened:
          for(i++;i<sis.size();i++) {
            try {
              readers[i].close();
            } catch (Throwable ignore) {
              // keep going - we want to clean up as much as possible
            }
          }
        }
      }
    }

    initialize(readers);
}

/**
   * @throws CorruptIndexException if the index is corrupt
   * @throws IOException if there is a low-level IO error
   */
public static SegmentReader get(boolean readOnly, SegmentInfo si, int termInfosIndexDivisor) throws CorruptIndexException, IOException {
    return get(readOnly, si.dir, si, BufferedIndexInput.BUFFER_SIZE, true, termInfosIndexDivisor);
}

Get方法会创建ReadOnlySegmentReader 对象，然后调用CoreReaders的构造函数，创建CoreReaders对象。用CoreReaders对象打开正向信息fdx和fdt文件，Fdx文件是fdt的索引文件，打开删除文件_x_n.del文件和_x.nrm文件

/**
   * @throws CorruptIndexException if the index is corrupt
   * @throws IOException if there is a low-level IO error
   */
public static SegmentReader get(boolean readOnly,
                                  Directory dir,
                                  SegmentInfo si,
                                  int readBufferSize,
                                  boolean doOpenStores,
                                  int termInfosIndexDivisor)
    throws CorruptIndexException, IOException {
    SegmentReader instance = readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
    instance.readOnly = readOnly;
    instance.si = si;
    instance.readBufferSize = readBufferSize;

    boolean success = false;

    try {
      instance.core = new CoreReaders(dir, si, readBufferSize, termInfosIndexDivisor);
      if (doOpenStores) {
        instance.core.openDocStores(si);
      }
      instance.loadDeletedDocs();
      instance.openNorms(instance.core.cfsDir, readBufferSize);
      success = true;
    } finally {

      // With lock-less commits, it's entirely possible (and
      // fine) to hit a FileNotFound exception above. In
      // this case, we want to explicitly close any subset
      // of things that were opened so that we don't have to
      // wait for a GC to do so.
      if (!success) {
        instance.doClose();
      }
    }
    return instance;
}

CoreReaders 会读取构造FieldInfos 对象，这个对象保存每个filed的信息也就是每个segment的_x.fnm 文件的信息，构建TermInfosReader对象，TermInfosReader会把tii文件里面的内容加载到内存里面，然后打开tis的文件，打开frg文件和prx文件。
CoreReaders(Directory dir, SegmentInfo si, int readBufferSize, int termsIndexDivisor) throws IOException {
      segment = si.name;
      this.readBufferSize = readBufferSize;
      this.dir = dir;

      boolean success = false;

      try {
        Directory dir0 = dir;
        if (si.getUseCompoundFile()) {
          cfsReader = new CompoundFileReader(dir, segment + "." + IndexFileNames.COMPOUND_FILE_EXTENSION, readBufferSize);
          dir0 = cfsReader;
        }
        cfsDir = dir0;

        fieldInfos = new FieldInfos(cfsDir, segment + "." + IndexFileNames.FIELD_INFOS_EXTENSION);

        this.termsIndexDivisor = termsIndexDivisor;
        TermInfosReader reader = new TermInfosReader(cfsDir, segment, fieldInfos, readBufferSize, termsIndexDivisor);
        if (termsIndexDivisor == -1) {
          tisNoIndex = reader;
        } else {
          tis = reader;
          tisNoIndex = null;
        }

        // make sure that all index files have been read or are kept open
        // so that if an index update removes them we'll still have them
        freqStream = cfsDir.openInput(segment + "." + IndexFileNames.FREQ_EXTENSION, readBufferSize);

        if (fieldInfos.hasProx()) {
          proxStream = cfsDir.openInput(segment + "." + IndexFileNames.PROX_EXTENSION, readBufferSize);
        } else {
          proxStream = null;
        }
        success = true;
      } finally {
        if (!success) {
          decRef();
        }
      }
    }

读取_x.fnm 文件加载FieldInfo 的信息

FieldInfos(Directory d, String name) throws IOException {
    IndexInput input = d.openInput(name);
    try {
      try {
        read(input, name);
      } catch (IOException ioe) {
        if (format == FORMAT_PRE) {
          // LUCENE-1623: FORMAT_PRE (before there was a
          // format) may be 2.3.2 (pre-utf8) or 2.4.x (utf8)
          // encoding; retry with input set to pre-utf8
          input.seek(0);
          input.setModifiedUTF8StringsMode();
          byNumber.clear();
          byName.clear();
          try {
            read(input, name);
          } catch (Throwable t) {
            // Ignore any new exception & throw original IOE
            throw ioe;
          }
        } else {
          // The IOException cannot be caused by
          // LUCENE-1623, so re-throw it
          throw ioe;
        }
      }
    } finally {
      input.close();
    }
}