`

Lucene: the core indexing classes

 
阅读更多


IndexWriter
 

IndexWriter is the central component of the indexing process. This class creates a new index or opens an existing one, and adds, removes, or updates documents in the index. Think of IndexWriter as an object that gives you write access to the index but doesn’t let you read or search it. IndexWriter needs somewhere to store its index, and that’s what Directory is for.

 

Directory

 

The Directory class represents the location of a Lucene index. It’s an abstract class that allows its subclasses to store the index as they see fit. In our Indexer example, we used FSDirectory.open to get a suitable concrete FSDirectory implementation that stores real files in a directory on the file system, and passed that in turn to IndexWriter’s constructor.

 

Analyzer

 

Before text is indexed, it’s passed through an analyzer. The analyzer, specified in the IndexWriter constructor, is in charge of extracting those tokens out of text that should be indexed and eliminating the rest.

 

Document

 

The Document class represents a collection of fields. Think of it as a virtual document—a chunk of data, such as a web page, an email message, or a text file—that you want to make retrievable at a later time. Fields of a document represent the document or metadata associated with that document. The original source (such as a database record, a Microsoft Word document, a chapter from a book, and so on) of
document data is irrelevant to Lucene. It’s the text that you extract from such binary documents, and add as a Field instance, that Lucene processes. The metadata (such as author, title, subject and date modified) is indexed and stored separately as fields of a document.

 

Field

 

Each document in an index contains one or more named fields, embodied in a class called Field. Each field has a name and corresponding value, and a bunch of options that control precisely how Lucene will index the field’s value.

 

  • 大小: 31.5 KB
分享到:
评论

相关推荐

    lucene-core-7.7.0-API文档-中文版.zip

    赠送jar包:lucene-core-7.7.0.jar; 赠送原API文档:lucene-core-7.7.0-javadoc.jar; 赠送源代码:lucene-core-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-core-7.7.0.pom; 包含翻译后的API文档:lucene...

    Lucene:基于Java的全文检索引擎简介

    Lucene是一个基于Java的全文索引工具包。 1. 基于Java的全文索引引擎Lucene简介:关于作者和Lucene的历史 2. 全文检索的实现:Luene全文索引和数据库索引的比较 3. 中文切分词机制简介:基于词库和自动切分词算法的...

    指南-Lucene:ES篇.md

    指南-Lucene:ES篇.md

    IKAnalyzer中文分词支持lucene6.5.0版本

    由于林良益先生在2012之后未对IKAnalyzer进行更新,后续lucene分词接口发生变化,导致不可使用,所以此jar包支持lucene6.0以上版本

    lucene-core-6.6.0-API文档-中文版.zip

    赠送jar包:lucene-core-6.6.0.jar; 赠送原API文档:lucene-core-6.6.0-javadoc.jar; 赠送源代码:lucene-core-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-core-6.6.0.pom; 包含翻译后的API文档:lucene...

    精品资料(2021-2022收藏)Lucene:基于Java的全文检索引擎简介.doc

    【Lucene:基于Java的全文检索引擎简介】 Lucene是一个由Java编写的开源全文检索引擎工具包,由Doug Cutting创建并贡献给Apache基金会,成为Jakarta项目的一部分。它不是一个独立的全文检索应用,而是提供了一个可...

    lucene-core-7.2.1-API文档-中文版.zip

    赠送jar包:lucene-core-7.2.1.jar; 赠送原API文档:lucene-core-7.2.1-javadoc.jar; 赠送源代码:lucene-core-7.2.1-sources.jar; 赠送Maven依赖信息文件:lucene-core-7.2.1.pom; 包含翻译后的API文档:lucene...

    Lucene:基于Java的全文检索引擎简介.rar

    1. **索引(Indexing)**:Lucene首先将非结构化的文本数据转换为倒排索引(Inverted Index),这是一个高效的数据结构,用于快速查找包含特定词汇的文档。索引过程包括分词(Tokenization)、词干提取(Stemming)...

    lucene-core-7.3.1-API文档-中文版.zip

    赠送jar包:lucene-core-7.3.1.jar; 赠送原API文档:lucene-core-7.3.1-javadoc.jar; 赠送源代码:lucene-core-7.3.1-sources.jar; 赠送Maven依赖信息文件:lucene-core-7.3.1.pom; 包含翻译后的API文档:lucene...

    lucene-core-6.6.0-API文档-中英对照版.zip

    赠送jar包:lucene-core-6.6.0.jar; 赠送原API文档:lucene-core-6.6.0-javadoc.jar; 赠送源代码:lucene-core-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-core-6.6.0.pom; 包含翻译后的API文档:lucene...

    精品资料(2021-2022收藏)Lucene:基于Java的全文检索引擎简介.docx

    **Lucene:基于Java的全文检索引擎** Lucene是一个由Apache软件基金会的Jakarta项目维护的开源全文检索引擎。它不是一个完整的全文检索应用,而是一个用Java编写的库,允许开发人员轻松地在他们的应用程序中集成...

    lucene:基于Java的全文检索引擎简介

    ### 基于Java的全文检索引擎Lucene简介 #### 1. Lucene概述与历史背景 Lucene是一个开源的全文检索引擎库,完全用Java编写。它为开发者提供了构建高性能搜索应用程序的基础组件。尽管Lucene本身不是一个现成的应用...

    基于 SSM 框架的二手书交易系统.zip

    快速上手 1. 运行环境 IDE:IntelliJ IDEA 项目构建工具:Maven 数据库:MySQL Tomcat:Tomcat 8.0.47 2. 初始化项目 创建一个名为bookshop的数据库,将bookshop.sql导入 打开IntelliJ IDEA,将项目导入 ...

    精品资料(2021-2022收藏)Lucene:基于Java的全文检索引擎简介22173.doc

    【Lucene:基于Java的全文检索引擎简介】 Lucene是一个由Java编写的全文索引工具包,它不是一个完整的全文检索应用,而是作为一个可嵌入的引擎,为各种应用程序提供全文检索功能。Lucene的设计目标是简化全文检索的...

    lucene-core-7.7.0-API文档-中英对照版.zip

    赠送jar包:lucene-core-7.7.0.jar; 赠送原API文档:lucene-core-7.7.0-javadoc.jar; 赠送源代码:lucene-core-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-core-7.7.0.pom; 包含翻译后的API文档:lucene...

    lucene-core-7.2.1-API文档-中英对照版.zip

    赠送jar包:lucene-core-7.2.1.jar; 赠送原API文档:lucene-core-7.2.1-javadoc.jar; 赠送源代码:lucene-core-7.2.1-sources.jar; 赠送Maven依赖信息文件:lucene-core-7.2.1.pom; 包含翻译后的API文档:lucene...

Global site tag (gtag.js) - Google Analytics