`

Solr 4.0: Realtime GET

 
阅读更多

he next functionality I decided to look at, from the upcoming Solr 4.0, is the so called “Realtime Get”. It allows you to see the data even though it was not yet added to the index, thus before the commit operation being sent to Solr. Let’s see how it works.

Some theory

Data update in Lucene and Solr has one disadvantage – when you submit index updates they can’t be seen until commit operation is run. The problem is that commit is costly in terms of performance and intense commiting may cause performance problems. So, when you need your data to be visible right after being change you may be forced to choose – either performance, or fast updates. In order to address that Lucene and Solr are working towards enabling Near Real Time (NRT) searching. In Lucene we have that possibility, in Solr 4.0 we will also be able to use that and not only that.

Configuration

In order to use Realtime Get functionality we need to configure the following Solr features:

Transaction log

The first thing to configure is the transaction log writing. In order to do that you need to add the following to your updateHandler configuration:

1 <updateLog>
2   <str name="dir">${solr.data.dir:}</str>
3 </updateLog>

The above entry says, that the directory holding transaction log will be located in the same directory where the index directory is located.

Realtime Get handler

The second thing that needs to be done, to see the Realtime Get in action, is the appropriate handler configuration (or adding component to your already defined handler). To do that add the following to your solrconfig.xml file:

1 <requestHandler name="/get" class="solr.RealTimeGetHandler">
2   <lst name="defaults">
3     <str name="omitHeader">true</str>
4   </lst>
5 </requestHandler>

The above entry it’s nothing unusual – it just add a new request handler implementing solr.RealTimeGetHandler class, which enables checking the transaction log.

Action

To check how Realtime Get works I decided to do a simple test. The first thing I did is indexing one file (from the ones that are available in the exampledocs directory) with the use of the following bash command:

1 curl 'http://localhost:8983/solr/update' -d @hd.xml -H 'Content-type:application/xml'

Of course I did not send the commit operation after indexing. As we could expect the following query:

didn’t return search results. So let’s check, if the handler registered as /get will be able to get us some results. In order to do that I send the following query:

And in result I got the following document:

01 <?xml version="1.0" encoding="UTF-8"?>
02 <response>
03 <doc name="doc">
04   <str name="id">SP2514N</str>
05   <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133</str>
06   <str name="manu">Samsung Electronics Co. Ltd.</str>
07   <str name="manu_id_s">samsung</str>
08   <arr name="cat">
09     <str>electronics</str>
10     <str>hard drive</str>
11   </arr>
12   <arr name="features">
13     <str>7200RPM, 8MB cache, IDE Ultra ATA-133</str>
14     <str>NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor</str>
15   </arr>
16   <float name="price">92.0</float>
17   <int name="popularity">6</int>
18   <bool name="inStock">true</bool>
19   <date name="manufacturedate_dt">2006-02-13T15:26:37Z</date>
20   <str name="store">35.0752,-97.032</str></doc>
21 </response>

So Solr returned the result that wasn’t added to the index – nice !

Usage possibilities

You probably noticed, that in order to fetch a document with /get handler I needed to provide it’s unique identifier (or identifiers list). That’s true, Realtime Get doesn’t support searching, because it was not created to support full searching. This functionality is able to show us the updates of the documents which identifiers are known (so for example the ones in the index) – in example by adding the component used in solr.RealTimeGetHandler to any of your defined handler. And the good news is – you don’t have to worry update performance – solr.RealTimeGet is very fast. So, if one of your problems is frequent updated you can look in the future with a smile :)

Last few words

The Realtime Get functionality brings new possibilities when it comes to Solr and also on the road to the SolrCloud. With the use of transaction log one can implement automatic cluster node restore or instance NRT instance updates. As you can see Solr 4.0 is not only about search, but also about data store and bringing Solr closer to NoSQL solutions.

 

 

转自:http://solr.pl/en/2012/01/09/solr-4-0-realtime-get-2/

分享到:
评论

相关推荐

    Solr 4.0 源代码实例

    Solr 4.0 是Apache Lucene项目的一个子项目,是一个高性能、全文本搜索服务器,为企业级数据提供强大的搜索功能。源代码实例是学习Solr内部工作原理和自定义功能的关键资源,尤其对于开发者而言,深入理解源码有助于...

    node-solr:Node.js 的 Solr 模块

    Node.js 的 Solr 模块参考Node.js: : Solr: : 使用npm test运行测试。 如果您没有在 127.0.0.1:8983 上运行 Solr,请编辑“test/common.js”。使用示例请参阅使用测试。 这是一个快速示例: var solr = require ( ...

    SolrSearch:将支持Solr的搜索界面添加到Omeka

    SolrSearch用由提供支持的...要使用该插件,您需要访问Solr 4.0+的安装,该安装运行solr-core/omeka下的插件源代码中包含solr-core/omeka 。 有关如何使用Solr进行启动和运行的一般信息,请查看官方。 安装 Solr核心 要

    (solr系列:五) solr定时实时重建索引和增量更新-附件资源

    (solr系列:五) solr定时实时重建索引和增量更新-附件资源

    puppet-ispconfig_solr:在 IspConfig 环境中使用的 solr 包装器

    puppet-ispconfig_solr == 定义:ispconfig_solr::instance 这个定义是 solr::instance 的包装器。 它创建一个 solr 实例并配置它以在 IspConfig 环境中使用 == 参数: [ instance_name ] solr 实例的名称。 实例...

    solr学习文档简介1

    2. 搜索索引:客户端(可以是浏览器可以是 Java 程序)用 GET 方法向 Solr 服务器发送请求,然后对 Solr 服务器返回 Xml、json 等格式的查询结果进行解析。 Solr 和 Lucene 的区别: Lucene 是一个开放源代码的...

    solr笔记solr笔记

    2. 搜索:客户端可以发送 get 请求到 Solr 服务器,请求 Solr 服务器给它响应一个结果文档。 Solr 的优点包括: 1. 高性能:Solr 可以快速处理大量数据,提供高性能的搜索和索引服务。 2. 灵活性:Solr 提供了一个...

    java8看不到源码-ansible-role-solr:yauh.solr-用于设置Solr的Ansible角色

    看不到源码Solr 引导程序 设置 Solr 搜索平台的 Ansible 角色 要求 系统上需要有Java,推荐角色yauh.java8。 角色变量 以下变量可与 solr 角色一起使用: solr_source: http://apache.openmirror.de/lucene/solr # ...

    solr_solr_

    Solr,全称为Apache Solr,是一款开源的企业级搜索平台,由Apache软件基金会维护。它基于Java,并且是Lucene库的一个高级搜索应用。Solr主要用于处理和索引大量文本数据,提供高效的全文检索、拼写检查、命中高亮、...

    Solr4.3.1:配置好的Solr,分词器使用IK

    Solr4.3.1配置好的Solr,分词器使用IK。使用步骤:拷贝solr目录到web服务器,如:tomcat的webapp目录下。拷贝solr_home到任意目录,如:/home目录下。修改solr目录中的web.xml,配置solr home的路径为:/home/solr_...

    Laravel-4-Solr:Apache Solr简单查询客户端

    Laravel 4 Apache Solr Laravel 4软件包提供了一个接口,用于通过其静态接口使用(查询) 。安装首先通过Composer安装此软件包。 编辑项目的composer.json文件,以要求davispeixoto/laravel-4-solr 。 "require": {...

    apache-solr-4.0.0-ALPHA-src.gz官方包

    源代码包"apache-solr-4.0.0-ALPHA-src.gz"包含了Solr 4.0的全部源代码,可以用于深入理解Solr的工作原理、内部架构以及核心功能。以下是关于这个源代码包的一些关键知识点: 1. **Solr 架构**:Solr采用分层架构,...

    配置了solr服务的tomcat

    List&lt;SolrDocument&gt; results = response.getResults(); for (SolrDocument doc : results) { System.out.println(doc); } ``` 以上就是配置 Solr 服务在 Tomcat 上运行并使用 SolrJ 进行数据交互的基本步骤。...

    lucene-solr:Apache Lucene和Solr开源搜索软件

    Apache Lucene和Solr Apache Lucene是用Java编写的高性能,功能齐全的文本搜索引擎库。 Apache Solr是使用Java编写并使用Apache Lucene的企业搜索平台。 主要功能包括全文搜索,索引复制和分片以及结果分面和突出...

    datagov-deploy-solr

    datagov-deploy-solr 该项目是一部分。 部署solr的角色。 用法 将此角色包括在您的requirements.yml 。 - src : https://github.com/gsa/datagov-deploy-solr.git 该角色取决于是否已安装Java。 我们建议在solr...

    solr -8.11.1.zip 文件

    solr -8.11.1.zip 文件

    开源bbs源码java-solr:索尔

    安装好ik分词器与拼音分词器的solr 版本: jdk1.8 solr6.0.1 tomcat8 使用说明 clone 代码 https://github.com/tomoya92/solr.git 打开 apache-tomcat-8.0.35/webapps/solr/WEB-INF/web.xml 修改 {solr_home} 为...

    solr导航搜索工具+文档+配置代码

    Solr,全称为Apache Solr,是一款开源的企业级全文搜索引擎,由Java编写,它提供了高效、可扩展的搜索和分析功能。在这个“solr导航搜索工具+文档+配置代码”压缩包中,包含了Solr的相关资源,可以帮助我们快速理解...

Global site tag (gtag.js) - Google Analytics