`

Solr: Indexing 1

    博客分类:
  • Solr
 
阅读更多



At a high level, the Solr indexing process distills down to three key tasks: 

  1. Convert a document from its native format into a format supported by Solr,such as XML or JSON.
  2. Add the document to Solr using one of several well-defined interfaces, typically HTTP POST.
  3. Configure Solr to apply transformations to the text in the document during indexing.

Solr supports several formats for indexing your document, including XML, JSON,and CSV.



 



 

------------------------------------------------------------------------------------------------------------------------------------

Designing your schema

Specifically, you’ll learn to answer the following key questions about your search application:

  • What is a document in your index?
  • How is each document uniquely identified?
  • What fields in your documents can be searched by users?
  • Which fields should be displayed to users in the search results?

 

Document granularity

Determining what a document should represent in your Solr index drives the entire schema-design process. In some cases it’s obvious, such as with our tweet example; the text content is typically short, so each tweet will be a document. But if the content you want to index is large, such as a technical computer book, you may want to treat subsections of a large document as the indexed unit. The key is to think about what your users will want to see in the search results.

 



 

Unique key

Solr doesn’t require a unique identifier for each document, but if one is supplied, Solr uses it to avoid duplicating documents in your index. If a document with the same unique key is added to the index, Solr overwrites the existing record with the latest document.

 

Indexed fields
The best way to think about indexed fields is to ask whether a typical user could develop a meaningful query using that field. Another way to decide if a field should be indexed is to determine if your users would miss it if you did not provide it as a queryable option in your search form.

In addition to enabling searching, you will also need to mark your field as indexed if you need to sort, facet, group by, provide query suggestions for, or execute function queries on values within a field.


Determining which fields to include in the index is specific to every search application.

Stored fields
Although users probably won’t search by editor name to find a book to read, we may  still want to display the editor’s name in the search results. In general, your documents may contain fields that aren’t useful from a search perspective but are still useful for displaying search results. In Solr, these are called stored fields.
Of course, a field may be indexed and stored, which can be searched and displayed in results.

----------------------------------------------------------------------------------------------------------------------------

schema.xml



 

 ------------------------------------------------------------------------------------------------------------------------------------

 Defining fields in schema.xml

 Required field attributes

 

<field name="screen_name"
type="string"
indexed="true"
stored="true" />
 

 

 Multivalued fields

In Solr, fields that can have more than one value per document are called multivalued fields.

 

<field name="link"
type="string"
indexed="true"
stored="true"
multiValued="true"/>
 

 

 

Dynamic fields

In Solr, dynamic fields allow you to apply the same definition to any fields in your documents whose names match either a prefix or suffix pattern, such as s_* or *_s. Dynamic fields use a special naming scheme to apply the same field definition to any fields that match this kind of glob-style pattern. Dynamic fields help address common problems that occur when building search applications, including

  • Modeling documents with many fields
  • Supporting documents from diverse sources
  • Adding new document sources

 

Copy fields

In Solr, copy fields allow you to populate one field from one or more other fields. Specifically, copy fields support two use cases that are common in most search applications:

  1. Populate a single catch-all field with the contents of multiple fields.   In most search applications, users are presented with a single search box in which to enter a query. The intent of this approach is to help your users quickly find documents without having to fill out a complicated form; think about how successful a simple search box has been for Google.

     

     
  2. Apply different text analysis to the same field content to create a new searchable field.     Solr copy fields give you the flexibility to enable or disable certain text-analysis features like stemming without having to duplicate storage in your index.

 

Unique key field

If you provide a unique identifier field for each of your documents, Solr will avoid creating duplicates during indexing. In addition, if you plan to distribute your Solr index across multiple servers, you must provide a unique identifier for your documents.

<uniqueKey>id</uniqueKey>

One thing to note is that it’s best to use a primitive field type, such as string or long, for the field you indicate as being the <uniqueKey/> as that ensures Solr doesn’t make any changes to the value during indexing.

 

 

 

 

 

 

 

 

 

  • 大小: 41.3 KB
  • 大小: 78.7 KB
  • 大小: 106.3 KB
  • 大小: 265.9 KB
  • 大小: 83 KB
  • 大小: 87 KB
  • 大小: 33.3 KB
  • 大小: 24.1 KB
  • 大小: 41.1 KB
分享到:
评论

相关推荐

    solr indexing

    solr indexing 介绍solr indexing过程,及常用的上传方法

    indexing-mysql-table-into-solr:将mysql表索引到solr中

    索引MySQL表到solr 将mysql表索引到solr中 在这里,我们将逐步进行过程。 要将mysql表索引到solr中,我们需要这些技术。 MySQL数据库 让我们从MySql开始聚会。 使用yum或任何您喜欢的方法,先安装mysql,再安装...

    Apache.Solr.Search.Patterns.1783981849

    Solr Indexing Internals Chapter 2. Customizing the Solr Scoring Algorithm Chapter 3. Solr Internals and Custom Queries Chapter 4. Solr for Big Data Chapter 5. Solr in E-commerce Chapter 6. Solr for ...

    windows-solr集群.docx

    - Solr管理界面提供了详细的配置选项,包括核心管理(Core Administration)、索引管理(Indexing Management)、查询处理(Query Handling)等功能模块。 #### 三、Solr数据导入与同步 1. **数据导入**: - 将Solr提供...

    solr6 增量导入demo

    在Solr6中,增量导入(Incremental Indexing)是一项重要的功能,它允许系统仅更新自上次导入以来发生变化的数据,从而提高了数据处理的效率并降低了资源消耗。本教程将深入探讨Solr6的增量导入及其应用。 一、Solr...

    Solr in action.mobi

    PART 1 MEET SOLR. .................................................................1 1 ■ Introduction to Solr 3 2 ■ Getting to know Solr 26 3 ■ Key Solr concepts 48 4 ■ Configuring Solr 82 5 ■ ...

    linux版solr

    1. **Solr核心(Core)**:Solr的核心是处理和存储数据的基本单位,每个Solr核心可以看作一个独立的搜索引擎。你可以根据需求创建多个核心,每个核心可以有自己独立的配置、索引和查询参数。 2. **索引(Indexing)**:...

    apache-solr-ref-guide-7.4(官方英文-文字版本)

    1. **Solr 教程**:通过一系列实际的操作步骤引导新用户快速上手 Solr,包括安装、配置和运行 Solr 等基本操作。 2. **Solr 快速概览**:对 Solr 的核心功能进行了简明扼要的介绍,为读者提供了 Solr 功能的整体...

    Apache Solr Essentials(PACKT,2015)

    The book starts off by explaining the fundamentals of Solr and then goes on to cover various topics such as data indexing, ways of extending Solr, client APIs and their indexing and data searching ...

    solr in action

    - **索引过程(Indexing Process)**:索引过程包括将原始数据转换为适合搜索的形式,并将其存储到索引中。这个过程通常涉及解析文档、提取元数据等步骤。 - **文本分析(Text Analysis)**:在索引文档之前,Solr会对其...

    Solr in action

    - **索引(Indexing)**:索引是Solr处理数据的基础。本节将详细介绍索引结构、文档字段类型以及如何优化索引效率。 - **文本分析(Text Analysis)**:文本分析是Solr处理非结构化文本数据的关键技术之一,涉及到分词、...

    Apache Solr lucene 搜索模块设计实现

    - **Indexing**:通过 IndexWriter 进行,文本被分析成一个个令牌(tokens),然后构建倒排索引。 - **Searching**:使用 IndexReader 和 Searcher 执行查询,计算相关度并返回结果。 - **Text Analysis**:在...

    Apache Solr(Apress,2015)

    The book, which assumes a basic knowledge of Java, starts with an introduction to Solr, followed by steps to setting it up, indexing your first set of documents, and searching them. It then covers the...

    Apache Solr 3 Enterprise Search Server 部分中文翻译

    1. **快速入门(开启Solr之旅)** 这部分内容介绍了如何启动和配置Solr服务器。Solr通常运行在Jetty或Tomcat等Servlet容器上。首先,你需要下载Solr的发行包,并解压到指定目录。然后,通过启动脚本或管理界面启动...

    apache-solr-ref-guide-7.1.pdf

    “Indexing Using Client APIs”和“Introduction to Solr Indexing”部分提供了关于如何使用客户端API进行索引的概览和简介。 “Post Tool”部分讲解了Post工具的使用,这是一个简单的命令行工具,用于向Solr发送...

    Apache.Solr.4.Enterprise.Search.Server.3rd.Edition.1782161368.epub

    Chapter 1. Quick Starting Solr Chapter 2. Schema Design Chapter 3. Text Analysis Chapter 4. Indexing Data Chapter 5. Searching Chapter 6. Search Relevancy Chapter 7. Faceting Chapter 8. Search ...

    Apache Solr [Apache Con 2006]

    - **Lucene**: Provides the underlying indexing and search capabilities. - **Admin Interface**: Offers a web-based interface for managing and configuring the Solr instance. - **Standard Request Handler...

    apache solr Reference guide 4.5.pdf

    “Indexing and Basic Data Operations”部分讲述了索引过程以及基础的索引操作,如提交(commit)、优化(optimize)和回滚(rollback)。这些操作是进行数据管理和维护的关键,对于保证索引的质量和性能至关重要。...

Global site tag (gtag.js) - Google Analytics