`
chenzehe
  • 浏览: 538149 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

Solr 创建索引 XML格式

 
阅读更多

Solr receives commands and possibly document data through HTTP POST.One way to send an HTTP POST is through the Unix command line program curl (also available on Windows through Cygwin: http://www.cygwin.com) and that's what we'll use here in the examples. An alternative cross-platform option that comes with Solr is post.jar located in Solr's example/exampledocs directory. To get some
basic help on how to use it, run the following command:

>> java –jar example/exampledocs/post.jar -help

You'll see in a bit that you can post name-value pair options as HTML form data. However, post.jar doesn't support that, so you'll be forced to specify the URL and put the options in the query string.(打开post.jar包,看到里面只有一个类SimplePostTool用来转发创建索引的,里面确定了solr服务器的URL只能为:public static final String DEFAULT_POST_URL = "http://localhost:8983/solr/update",对于自己部署的solr服务不能使用)

 

There are several ways to tell Solr to index data, and all of them are through  HTTP POST:

•     Send the data as the entire POST payload. curl does this with --data-binary (or some similar options) and an appropriate content-type header for whatever the format is.
•     Send some name-value pairs akin to an HTML form submission. With curl, such pairs are preceded by -F. If you're giving data to Solr to be indexed as opposed to it looking for it in a database, then there are a few ways to  do that:
     ° Put the data into the stream.body parameter. If it's small, perhaps less than a megabyte, then this approach is fine. The limit is configured with the multipartUploadLimitInKB setting in solrconfig.xml, defaulting to 2GB. If you're tempted to increase this limit, you should reconsider your approach.
     ° Refer to the data through either a local file on the Solr server using the stream.file parameter or a URL that Solr will fetch through the stream.url parameter. These choices are a feature that Solr calls
remote streaming.

 

Here is an example of the first choice. Let's say we have a Solr Update-XML file named artists.xml in the current directory. We can post it to Solr using the following command line:

>> curl http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=utf-8' --data-binary @artists.xml

 

If it succeeds, then you'll have output that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
    <int name="status">0</int><int name="QTime">128</int>
</lst>
</response>

 

To use the stream.body feature for the preceding example, you would do this:

curl http://localhost:8983/solr/update -F stream.body=@artists.xml
 

In both cases, the @ character instructs curl to get the data from the file instead of being @artists.xml literally. If the XML is short, then you can just as easily specify it literally on the command line:

curl http://localhost:8983/solr/update -F stream.body='<commit />'
 

Notice the leading space in the value. This was intentional. In this example, curl treats @ and < to mean things we don't want. In this case, it might be more appropriate to use form-string instead of -F. However, it's more typing, and I'm feeling lazy.

 

Remote streaming
In the preceding examples, we've given Solr the data to index in the HTTP message. Alternatively, the POST request can give Solr a pointer to the data in the form of either a file path accessible to Solr or an HTTP URL to it.

 

Just as before, the originating request does not return a response until Solr has finished processing it. If the file is of a decent size or is already at some known URL, then you may find remote streaming faster and/or more convenient, depending on your situation.

 

Here is an example of Solr accessing a local file:

curl http://localhost:8983/solr/update -F stream.file=/tmp/artists.xml
 

To use a URL, the parameter would change to stream.url, and we'd specify a URL. We're passing a name-value parameter (stream.file and the path), not the actual data.

 

Solr's Update-XML format

Using an XML formatted message, you can supply documents to be indexed, tell Solr to commit changes, to optimize the index, and to delete documents. Here is a sample XML file you can HTTP POST to Solr that adds (or replaces) a couple documents:

<add overwrite="true">
  <doc boost="2.0">
    <field name="id">5432a</field>
    <field name="type" ...</field>
    <field name="a_name" boost="0.5"></field>
    <!-- the date/time syntax MUST look just like this -->
    <field name="begin_date">2007-12-31T09:40:00Z</field>
  </doc>
  <doc>
    <field name="id">myid</field>
    <field name="type" ...
    <field name="begin_date">2007-12-31T09:40:00Z</field>
  </doc>
  <!-- more doc elements here as needed -->
</add>

 

The overwrite attribute defaults to true to guarantee the uniqueness of values in the field that you have designated as the unique field in the schema, assuming you have such a field. If you were to add another document that has the same value for the unique field, then this document would overwrite the previous document. You will not get an error.

 

The boost attribute affects the scores of matching documents in order to affect ranking in score-sorted search results. Providing a boost value, whether at the document or field level, is optional. The default value is 1.0, which is effectively a non-boost. Technically, documents are not boosted, only fields are. The effective boost value of a field is that specified for the document multiplied by that specified for the field.

 

Deleting documents

You can delete a document by its unique field. Here we delete two documents:

<delete><id>Artist:11604</id><id>Artist:11603</id></delete>

 To more flexibly specify which documents to delete, you can alternatively use a Lucene/Solr query:

<delete><query>timestamp:[* TO NOW-12HOUR]</query></delete>
 

Commit

Data sent to Solr is not immediately searchable, nor do deletions take immediate effect. Like a database, changes must be committed first. The easiest way to do this is to add a commit=true request parameter to a Solr update URL. The request to Solr could be the same request that contains data to be indexed then committed or an empty request—it doesn't matter. For example, you can visit this URL to issue a commit on our mbreleases core: http://localhost:8983/solr/update?commit=true. You can also commit changes using the XML syntax by simply sending this to Solr:
<commit />

分享到:
评论

相关推荐

    solr创建索引的原理及解析

    ### Solr创建索引的原理及解析 #### 一、Solr概述与索引机制 Apache Solr是一款基于Lucene的高性能全文检索服务器,广泛应用于网站的搜索功能中。Solr支持分布式部署,并且提供了丰富的API接口,方便与其他系统...

    hbase+solr创建二级索引完整操作

    ### hbase+solr创建二级索引完整操作 #### 一、概述 本文档详细介绍了如何利用HBase和Solr创建二级索引的过程。通过整合HBase与Solr的优势,可以构建高性能的数据存储与检索系统。HBase作为分布式列族数据库,能够...

    基于solr的网站索引架构(一)

    4. **创建索引**: 要对网站数据建立索引,首先需要将数据导入Solr。这通常通过Solr的DataImportHandler完成,它可以连接到数据库,抽取数据并转化为Solr可理解的格式。索引过程包括解析、分析和存储,其中分析阶段...

    使用MySQL作为SOLR的索引源

    3. **创建映射**:在`data-config.xml`中,设置字段映射,将MySQL的字段与SOLR的字段对应起来,以便于构建索引。 4. **触发索引更新**:可以通过SOLR的API或Web界面来触发全量或增量数据导入,使SOLR根据MySQL中的...

    solr定时索引

    3. **创建定时任务**:在Solr服务器的外部,如Linux服务器,你可以使用cron job或Windows的任务计划程序来定期执行更新索引的命令。命令通常是发送一个HTTP请求到Solr的Update Handler来触发数据导入。 4. **增量...

    solr查询索引

    1. **建立索引**:Solr通过分析和索引文档内容来创建索引,这个过程包括分词、词干提取、停用词过滤等文本预处理步骤。在本例中,虽然没有具体的源代码,但我们可以假设这些库文件如`wstx-asl-3.2.7.jar`(Woodstox ...

    Solr全文索引

    - **XML/JSON接口**:Solr使用HTTP协议和XML/JSON格式进行通信,方便与其他系统集成。 2. **Solr核心组件** - **Core**:Solr的基本工作单元,包含了配置信息、索引数据和处理请求的组件。 - **Schema**:定义了...

    跟益达学Solr5之索引文件夹下所有文件

    - 在`solrconfig.xml`中配置Tika解析器,让Solr能理解多种格式的文件内容。 5. **运行数据导入**: - 使用命令行工具或Solr Admin UI触发数据导入,如`curl ...

    solr的学习

    - **创建索引**:客户端(可以是浏览器或 Java 程序)用 POST 方法向 Solr 服务器发送一个描述 Field 及其内容的 XML 文档,Solr 服务器根据 XML 文档添加、删除或更新索引。 - **搜索索引**:客户端用 GET 方法向 ...

    跟益达学Solr5之增量索引MySQL数据库表数据

    5. **处理增量数据**: 当Solr接收到增量数据后,它会将这些数据转化为适合索引的格式,然后添加到索引中。对于更新和删除操作,需要特别处理,确保索引中的数据与数据库保持一致。 6. **监控和优化**: 为了确保系统...

    配置好的solr启动环境

    2. **增量创建索引**:Solr支持增量索引,意味着当新的数据加入或已有数据发生变化时,无需重新构建整个索引,而是只更新受影响的部分。这对于大型数据集来说,既节省时间又节省资源。 3. **创建索引**:索引是Solr...

    CDH使用Solr实现HBase二级索引.docx

    我们将从创建 HBase 表和插入数据开始,然后创建 Solr 的分片,配置 morphline-hbase-mapper.xml 文件,最后注册 Lily HBase Indexer Configuration 和 Lily HBaseIndexer Service。 首先,我们需要创建一个 HBase ...

    solr资料

    - Solr Home 指定了Solr创建和存储索引文件的位置。配置Solr Home的关键在于定义正确的路径。 - 在Tomcat的配置文件夹 `conf/Catalina/localhost` 内创建一个名为 `solr.xml` 的文件(名称可根据实际需要命名),...

    solr配置中文解析器和将数据导入solr索引库时所需的jar包

    本篇文章将深入探讨如何在Solr中配置中文解析器以及在导入数据到Solr索引库时所需关注的jar包。 首先,让我们来了解Solr中的中文解析器。在Solr中,解析器是处理文本输入的关键组件,它负责将原始文本转化为可搜索...

    solr(solr-9.0.0.tgz)

    其中,`solr.xml`是Solr的全局配置文件,`configsets`包含了预定义的配置集,可以快速创建和配置索引。 3. **dist** 文件夹:包含Solr的JAR文件和相关的依赖库,这些文件在启动Solr时会被加载。 4. **docs** ...

    solr4.7服务搭建

    3. **创建 Solr Home**:在 solr 目录下创建一个 home 文件夹(可以自定义名称)。 #### 四、部署 Solr 战包 1. **复制 solr.war 文件**:将 `D:\solr\solr-4.7.2\example\webapps` 下的 solr.war 文件复制到 `D:\...

    solr4.9.0.zip

    Solr的安装过程相对简单,只需配置相应的环境变量,启动Solr服务器,然后就可以开始创建索引和进行搜索服务。 总之,Solr 4.9.0 是一个强大且功能丰富的全文搜索引擎,适用于各种规模的企业和项目,它的易用性、...

    solr教程+实例

    4.1 创建索引:通过Solr Admin界面或API创建索引,例如为一个博客网站创建文章索引。 4.2 搜索接口:使用HTTP GET请求发送查询,接收JSON或XML格式的搜索结果。 4.3 高级查询:使用函数查询、评分、过滤器、聚合等...

Global site tag (gtag.js) - Google Analytics