`
sillycat
  • 浏览: 2551569 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

solr(8)Solr Index Replication

 
阅读更多
solr(8)Solr Index Replication

1. Introduction Index Replication
The master will handle updates to the index, all querying is handled by the slaves.

Master — Slave1, Slave2, Slave3. A Solr index can be replicated across multiple slave servers, which then process requests.

Index Replication in Solr
Solr includes a Java implementation of index replication that works over HTTP.
solrconfig.xml, works across platforms with same configuration.

Some Term
Index - A lucene index is a directory of files. These files make up the searchable and returnable data of a Solr Core.
Distribution - The copying of an index from the master server to all slaves.
Inserts and Deletes - The directory remains unchanged. Documents are always inserted into newly created files. Documents         
                                  that are deleted are not removed from the files. They are flagged in the file, deletable, and are not
                                  removed from the files until the index is optimized.
Master and Slave - Slave nodes receive no updates directly, instead all changes are made against the single master node.
                               Changes made on the master are distributed to all the salve nodes.
Update -
Optimization - Optimization should only be run on the master nodes.
Segment -
mergeFactor -
Snapshot -

Configuring the ReplicationHandler
maxNumberOfBackups - how many backups we keep on disk as it receives back commands.

Configuring the Replication RequestHandler on a Master Server
<requestHandler name="/replication" class="solr.ReplicationHandler">
  <lst name="master">
    <str name="replicateAfter">optimize</str>
    <str name="backupAfter">optimize</str>
    <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
    <str name="commitReserveDuration">00:00:10</str>
  </lst>
  <int name="maxNumberOfBackups">2</int>
  <lst name="invariants">
    <str name="maxWriteMBPerSec">16</str>
  </lst>
</requestHandler>

Configuring the Replication RequestHandler on a Slave Server
<requestHandler name="/replication" class="solr.ReplicationHandler">
  <lst name="slave">
    <!-- fully qualified url for the replication handler of master. It is
         possible to pass on this as a request param for the fetchindex command -->
    <str name="masterUrl">http://remote_host:port/solr/core_name/replication</str>
    <!-- Interval in which the slave should poll master.  Format is HH:mm:ss .
         If this is absent slave does not poll automatically.
         But a fetchindex can be triggered from the admin or the http API -->
    <str name="pollInterval">00:00:20</str>
    <!-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED-->
    <!-- To use compression while transferring the index files. The possible
         values are internal|external.  If the value is 'external' make sure
         that your master Solr has the settings to honor the accept-encoding header.
         See here for details: http://wiki.apache.org/solr/SolrHttpCompression
         If it is 'internal' everything will be taken care of automatically.
         USE THIS ONLY IF YOUR BANDWIDTH IS LOW.
         THIS CAN ACTUALLY SLOWDOWN REPLICATION IN A LAN -->
    <str name="compression">internal</str>
    <!-- The following values are used when the slave connects to the master to
         download the index files.  Default values implicitly set as 5000ms and
         10000ms respectively. The user DOES NOT need to specify these unless the
         bandwidth is extremely low or if there is an extremely high latency -->
    <str name="httpConnTimeout">5000</str>
    <str name="httpReadTimeout">10000</str>
    <!-- If HTTP Basic authentication is enabled on the master, then the slave
         can be configured with the following -->
    <str name="httpBasicAuthUser">username</str>
    <str name="httpBasicAuthPassword">password</str>
  </lst>
</requestHandler>

Setting Up a Repeater with the ReplicationHandler
<requestHandler name="/replication" class="solr.ReplicationHandler">
  <lst name="master">
    <str name="replicateAfter">commit</str>
    <str name="confFiles">schema.xml,stopwords.txt,synonyms.txt</str>
  </lst>
  <lst name="slave">
    <str name="masterUrl">http://master.solr.company.com:8983/solr/core_name/replication</str>
    <str name="pollInterval">00:00:60</str>
  </lst>
</requestHandler>

Commit and Optimize Operations

Slave Replication

Replicating Configuration Files

Reading the Sample Configuration and Setup an Replication Myself

2. Understanding of Solr
There is a URL/handlers mapping in solrconfig.xml.

Some built-in handlers in Solr
Search handlers: DisMaxRequestHandler, LukeRequestHandler, MoreLikeThisHandler, SearchHandler, SpellCheckerRequestHandler

We are using solr.SearchHandler, http://wiki.apache.org/solr/SearchHandler

Update handlers: DataImportHandler, BinaryUpdateRequestHandler, CSVUpdateRequestHandler, ExtractingRequestHandler, JsonUpdateRequestHandler, XmlUpdateRequestHandler, XsltUpdateRequestHandler

We are using
solr.RealTimeGetHandler,
solr.UpdateRequestHandler,
solr.ExtractingRequestHandler,
solr.FieldAnalysisRequestHandler,
solr.DocumentAnalysisRequestHandler,
solr.admin.AdminHandler,
solr.PingRequesthandler,
solr.DumpRequestHandler,
solr.ReplicationHandler,
solr.DirectUpdateHandler2

UpdateRequestProcessor and Chain, something like FilterChain. The implement class will extends from SearchComponent

Java - solrj
scala - https://github.com/inoio/solrs
            https://github.com/takezoe/solr-scala-client

3. Configuration on Ubuntu Master and Slaves
…todo...


References:
solr 6 and solr 7
http://sillycat.iteye.com/blog/2227066
http://sillycat.iteye.com/blog/2227398

pull mode index replication
https://cwiki.apache.org/confluence/display/solr/Index+Replication

distributed search with index sharding
https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding

Solr Cloud
https://cwiki.apache.org/confluence/display/solr/SolrCloud

Solr Article
http://blog.csdn.net/jaynol/article/details/17230857
http://blog.csdn.net/jaynol/article/details/24717123
http://blog.csdn.net/jaynol/article/details/24776437
http://blog.csdn.net/jaynol/article/details/24959373
http://blog.csdn.net/jaynol/article/details/25098667
http://blog.csdn.net/jaynol/article/details/25305323
http://blog.csdn.net/jaynol/article/details/25604271

lucene
https://lucene.apache.org/core/5_2_1/

solr clients
https://wiki.apache.org/solr/IntegratingSolr

XML for Solr
https://wiki.apache.org/solr/UpdateXmlMessages#XML_Messages_for_Updating_a_Solr_Index
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers

akka system
http://sillycat.iteye.com/blog/1767866
http://sillycat.iteye.com/blog/1768625
http://sillycat.iteye.com/blog/1768626
http://sillycat.iteye.com/blog/2099267
http://sillycat.iteye.com/blog/2100232

actor waiting
http://stackoverflow.com/questions/3107286/wait-for-an-actor-to-exit
分享到:
评论

相关推荐

    最新版windows solr-8.8.2.zip

    1. **分布式搜索**:Solr支持在多台服务器上分布式部署,通过Sharding和Replication技术,能够处理海量数据,并实现快速的搜索响应。 2. **灵活的数据导入**:Solr提供了DataImportHandler(DIH),可以方便地从...

    Solr项目源码及solr资源包

    8. **SolrCloud模式**:如果项目涉及到SolrCloud,那么还需要了解分布式搜索和存储的概念,如Sharding(分片)、Replication(复制)和ZooKeeper(协调节点)。 9. **Spring Data Repository**:项目可能使用了...

    Solr(Cloudera)使用手册

    &lt;analyzer type="index"&gt; ``` 2. **配置分词词典**:根据需求选择合适的分词词典,并配置到IK分词器中。 3. **加载中文分词jar包**:将中文分词器的jar包添加到Solr的lib目录下,以供Solr使用。 ##...

    solr入门java工程

    4. 分布式搜索的概念,特别是SolrCloud的Sharding和Replication机制。 5. 如何配置和使用CloudSolrClient进行SolrCloud的交互。 通过深入研究这个入门工程,你将能够熟练掌握Solr与Java的结合使用,为构建基于华为...

    最新版windos solr-8.11.0.zip

    8. **API支持**:Solr提供RESTful API和Java API,方便开发者通过各种语言与Solr进行交互。新版本可能增强了API的功能和兼容性。 9. **安全特性**:Solr 8.11.0可能会包含增强的安全功能,如用户认证、权限控制,以...

    solr5.5.4部署及使用

    首先,我们需要准备合适的环境,这里要求的是JDK 1.7 或更高版本,Tomcat 8 或以上版本,以及Solr 5.5.4和相应的SolrJ库。这些组件可以从官方网站下载并进行安装。 对于Jetty部署方式,首先从Apache官网下载Solr ...

    solr搜索服务器安装配置

    3. **分布式部署**:通过Sharding和Replication机制实现Solr集群的分布式部署,提高系统的可用性和扩展性。 4. **监控工具**:使用Solr提供的监控工具或者第三方工具(如Ganglia、Nagios等)对Solr集群进行监控。 #...

    solr4.9.0.zip

    4. **分布式搜索**:Solr 4.9.0 支持分布式搜索,可以将索引分散在多台服务器上,通过Sharding和Replication技术实现负载均衡和数据冗余,提高系统的可用性和性能。 5. **实时索引**:Solr具有实时索引能力,即新...

    solr 4.10&

    - **Replication(复制)**:通过主从复制,确保数据的一致性和高可用性,当主节点故障时,从节点可以接管服务。 - **Cloud模式**:通过ZooKeeper协调,支持动态添加和删除节点,实现自动负载均衡和故障恢复。 - ...

    solr安装与配置

    - **配置Replication**:为提高系统的可用性和容错能力,可以设置多个副本,确保数据的安全性和一致性。 综上所述,Solr的安装与配置涉及多个环节,从基本的环境搭建到复杂的集群配置,每一步都至关重要。通过本文...

    solr 分布式参考

    3. **Replication(复制)**: 分片可以有多个副本,以提高可用性和容错性。如果主分片出现问题,副本可以接管服务,确保服务不中断。 4. **路由与负载均衡**: 当客户端发起请求时,请求会通过Zookeeper被路由到正确...

    ES和solr搜索方案对比

    首先,ES是一个分布式搜索服务器,它提供了轻松的分片(sharding)和复制(replication)功能。这意味着ES能够将一个大索引分割成小块,分散在不同的节点上,同时它还能够将索引复制到多个节点,从而实现高可用性和...

    solr-8.1.1.tgz

    8. **自定义插件**:Solr允许开发人员编写自己的插件,以满足特定需求,如自定义查询解析器、过滤器、排序策略等,增强了其灵活性和可扩展性。 9. **性能优化**:每个新版本都会对性能进行优化,Solr 8.1.1也不例外...

    solr4.3源代码一

    通过Sharding和Replication,SolrCloud可以将索引分散在多个节点上,实现高可用性和负载均衡。 **Analysis组件** 在你提到的"analysis"目录中,包含的是Solr的文本分析模块。这部分代码负责对输入的文本进行预处理...

    solr开发维护参考

    8. **实时索引**:Solr支持实时添加、更新和删除文档,这对于需要实时反映数据变化的应用场景非常重要。 9. **优化与分析**:索引优化是定期进行的,以合并碎片并减少存储空间。同时,性能分析和调优也是维护工作的...

    solr-7.5.0.zip

    8. **实时索引**:Solr支持实时索引,一旦数据被添加、更新或删除,索引会立即反映这些变化,确保搜索结果的实时性。 9. **优化和分析工具**:Solr提供了许多诊断和优化工具,如JMX监控、分析器调试、性能测试工具...

    solr-7.7.2+ik-analyzer-solr7x.zip

    2. **分布式搜索**:通过Sharding和Replication,Solr可以在多台服务器上构建大规模的搜索集群,实现数据的分散存储和并行处理。 3. **多字段类型支持**:Solr允许为不同的字段定义不同的数据类型,如text、int、...

    solr6.6.0源码

    4. **复制(Replication)**:用于保证数据的一致性和冗余,防止单点故障。 5. **Cloud模式**:在ZooKeeper的协调下,Solr支持云部署,实现动态集群管理和数据路由。 二、Solr配置 Solr的配置主要通过`solrconfig....

    solr 配置 以及建立索引

    &lt;requestHandler name="/replication" class="solr.ReplicationHandler"&gt; &lt;str name="replicateAfter"&gt;commit &lt;str name="replicateAfter"&gt;startup &lt;str name="confFiles"&gt;schema.xml,stopwords.txt ``` ...

    solr安装资源

    8. **监控与性能调优**:Solr提供了监控接口,可以通过JMX或Solr的Admin UI查看系统状态、统计信息和查询性能。通过对索引结构、硬件资源和网络配置的调整,可以进一步优化Solr的性能。 9. **安全与权限控制**:...

Global site tag (gtag.js) - Google Analytics