`
sillycat
  • 浏览: 2551690 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Solr(9)Solr Index Replication on Ubuntu and Scala Client

 
阅读更多
Solr(9)Solr Index Replication on Ubuntu and Scala Client

1. Create One More Core
Go to the example directory, copy the core directory
> pwd
/opt/solr/example
> cp -r collection1 jobs
> rm -fr collection1/

Go to the example directory, start the server
> java -jar start.jar

Go to the console web UI
http://ubuntu-master:8983/solr/#/

Add Core -
name: jobs
instanceDir: /opt/solr/example/solr/jobs
dataDir: /opt/solr/example/solr/jobs/data
config: /opt/solr/example/solr/jobs/conf/solrconfig.xml
schema: /opt/solr/example/solr/jobs/conf/schema.xml

Add one Record to the Solr System
Jobs —> Documents —> Request-Handler —>Document Type (Solr Command Raw XML or JSON)
<add>
<doc>
<field name=“id”>1</field>
<field name=“title”>senior software engineer</field>
</doc>
</add>

It is not working, maybe because of the commit issue. I tried with JSON, it works.
{
  “id”:”1”,
  “title”:”software engineer"
}

Click on the “Query” Tab, you will get all your data from there.

2. Set up the Replicate Server
> scp -r ubuntu-master:/opt/solr/example/solr/jobs ./

Check the master configuration, search for “/replication”, adding these configuration
       <lst name="master">
         <str name="enable">${master.enable:false}</str>
         <str name="replicateAfter">commit</str>
         <str name="replicateAfter">startup</str>
         <str name="confFiles">schema.xml,stopwords.txt</str>
       </lst>
       <lst name="slave">
         <str name="enable">${slave.enable:false}</str>
         <str name="masterUrl">${master.url:http://ubuntu-master:8983/solr/jobs}</str>
         <str name="pollInterval">00:00:60</str>
         <str name="httpConnTimeout">5000</str>
        <str name="httpReadTimeout">10000</str>
       </lst>

Also, change the auto commit time in the configuration.
     <autoCommit>
          <maxDocs>300000</maxDocs>
          <!-- 5 minutes -->
          <maxTime>300000</maxTime>
          <openSearcher>true</openSearcher>
     </autoCommit>

I will do the same thing on other solr configuration files on slaves. I have 2 slaves, ubuntu-dev1, ubuntu-dev2

Start the master with this command with the master option enabled
> java -Dmaster.enable=true -jar start.jar

Start the slaves on my slave servers.
> java -Dslave.enable=true -jar start.jar

From the Server Web UI Console, I can only see the replication is enabled.
http://ubuntu-master:8983/solr/#/jobs/replication

We can go to the slave console to check
http://ubuntu-dev1:8983/solr/#/jobs/query

Right now, I can add one more data on master and check if it gets indexed on the slaves.
Some console logging on slaves
529867 [snapPuller-10-thread-1] INFO  org.apache.solr.handler.SnapPuller  – Slave in sync with master.
589867 [snapPuller-10-thread-1] INFO  org.apache.solr.handler.SnapPuller  – Master's generation: 8
589868 [snapPuller-10-thread-1] INFO  org.apache.solr.handler.SnapPuller  – Slave's generation: 7
589869 [snapPuller-10-thread-1] INFO  org.apache.solr.handler.SnapPuller  – Starting replication process
589883 [snapPuller-10-thread-1] INFO  org.apache.solr.handler.SnapPuller  – Number of files in latest index in master: 52

After the process, we can search any latest data on slaves and masters.

3. Set up the Load Balance
I am running HA PROXY with the SOLR master, so I need to choose another port number, the configuration will be as follow:
listen solr_cluster 0.0.0.0:8984
       acl master_methods method POST DELETE PUT
       use_backend solr_master_backend if master_methods
       default_backend solr_read_backends

backend solr_master_backend
       server solr-master ubuntu-master:8983 check inter 5000 rise 2 fall 2
   
backend solr_read_backends
       balance roundrobin
       server solr-slave1 ubuntu-dev1:8983 check inter 5000 rise 2 fall 2
       server solr-slave2 ubuntu-dev2:8983 check inter 5000 rise 2 fall 2

It is working well, we can check from here
http://ubuntu-master/haproxy-status

4. Build a Simple Client

https://github.com/takezoe/solr-scala-client
This class helps a lot. CaseClassMapper

package com.sillycat.jobsconsumer.persistence

import com.sillycat.jobsconsumer.models.Job
import com.sillycat.jobsconsumer.utilities.{IncludeConfig, IncludeLogger}
import jp.sf.amateras.solr.scala.SolrClient
import jp.sf.amateras.solr.scala.sample.Param


/**
* Created by carl on 8/6/15.
*/
object SolrClientDAO extends IncludeLogger with IncludeConfig{

  private val solrClient = {
    try {
      logger.info("Init the SOLR Client ---------------")
      val solrURL = config.getString(envStr("solr.url.jobs"))
      logger.info("SOLR URL = " + solrURL)
      val client = new SolrClient(solrURL)
      client
    } catch {
      case x: Throwable =>
        logger.error("Couldn't connect to SOLR: " + x)
        null
    }
  }

  def releaseResource = {
    if(solrClient != null){
      solrClient.shutdown()
    }
  }


  def addJob(job:Job): Unit ={
    //logger.debug("Adding job (" + job + ") to solr")
    solrClient.add(job)
  }

  def query(query:String):Seq[Job] = {
    logger.debug("Fetching the job results with query = " + query)
    val result = solrClient.query(query).getResultAs[Job]()
    result.documents
  }

  def commit = {
    solrClient.commit()
  }

}

The dependency will be as follow:
//for solr scala driver
resolvers += "amateras-repo" at "http://amateras.sourceforge.jp/mvn/"

  "jp.sf.amateras.solr.scala" %% "solr-scala-client" % "0.0.12",

And the Test Class is as follow:
package com.sillycat.jobsconsumer.persistence

import com.sillycat.jobsconsumer.models.Job
import com.sillycat.jobsconsumer.utilities.IncludeConfig
import org.scalatest.{BeforeAndAfterAll, Matchers, FunSpec}
import redis.embedded.RedisServer

/**
* Created by carl on 8/7/15.
*/
class SolrDAOSpec extends FunSpec with Matchers with BeforeAndAfterAll with IncludeConfig{

  override def beforeAll() {
    if(config.getString("build.env").equals("test")){

    }
  }

  override def afterAll() {

  }

  describe("SolrDAO") {
    describe("#add and query"){
      it("Add one single job to Solr") {
        val expect = Job("id1","title1","desc1","industry1")

        val num = 10000
        val start = System.currentTimeMillis()
        for ( i<- 1 to num){
          val job = (Job("id" + i, "title" + i, "desc" + i, "industry" + i))
          SolrClientDAO.addJob(job)
        }

        val end = System.currentTimeMillis()

        println("total time for " + num + " is " + (end-start))
        println("it is " + num / ((end-start)/1000) + " jobs/second")


//        SolrDAO.commit
//        val result = SolrDAO.query("title:title1")
//        result should not be (null)
//        result.size > 0 should be (true)
//        result.foreach { item =>
//          println(item.toString + "\n")
//        }
      }
    }
  }

}


Clean all the data during testing
http://ubuntu-master:8983/solr/jobs/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true

Actually the data schema is stored and defined in conf/schema.xml, I should update as follow:
   <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
   <field name="desc" type="text_general" indexed="true" stored="true" multiValued="false"/>
   <field name="industry" type="text_general" indexed="true" stored="true" multiValued="false"/>


add single job at one time
total time for 10000 is 180096
it is 55 jobs/second

Find the log4j.properties here and change the log level
/opt/solr/example/resources/log4j.properties

I turned off the logging and used 2 threads on the clients, I get performance about below on each.
total time for 10000 is 51688
it is 196 jobs/second

The performance is as follow for single threads
total time for 10000 is 28398
it is 357 jobs/second


References:
Setup Scaling Servers
http://blog.csdn.net/thundersssss/article/details/5385699
http://lutaf.com/197.htm
http://blog.warningrc.com/2013/06/10/Solr-data-backup.html

Single mode on Jetty
http://sillycat.iteye.com/blog/2227398

load balance on the slaves
http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balancing-haproxy-and-drupal
https://gist.github.com/feniix/1974460
http://stackoverflow.com/questions/10090386/how-to-check-solr-healthy-using-haproxy

solr clients
https://github.com/takezoe/solr-scala-client
https://wiki.apache.org/solr/Solrj
分享到:
评论

相关推荐

    ubuntu下安装solr4.9详细介绍

    文档中详细介绍了如何在ubuntu下面安装solr-4.9.0,以及在安装过程中出现的问题和解决办法

    solr-scala-client:Scala的Solr客户端

    而当我们使用Scala进行开发时,`solr-scala-client`就是专门为Scala开发者设计的Solr客户端库,它提供了一套简洁、强大的API,用于在Scala应用中与Solr进行交互。 **一、solr-scala-client简介** `solr-scala-...

    Ubuntu16.04安装部署solr7

    ### Ubuntu16.04上Solr7的安装与部署详解 #### 一、环境准备与配置 在开始部署Solr7之前,首先确保已经具备了以下基础环境: 1. **虚拟机环境**:本教程使用的是VMware12.0作为虚拟机平台,操作系统为Ubuntu16.04...

    Apache Solr Essentials(PACKT,2015)

    The book starts off by explaining the fundamentals of Solr and then goes on to cover various topics such as data indexing, ways of extending Solr, client APIs and their indexing and data searching ...

    solr(solr-9.0.0-src.tgz)源码

    Solr是Apache软件基金会的一个开源项目,它是基于Java的全文搜索服务器,被广泛应用于企业级搜索引擎的构建。源码分析是深入理解一个软件系统工作原理的重要途径,对于Solr这样的复杂系统尤其如此。这里我们将围绕...

    solr(solr-9.0.0.tgz)

    Solr,全称为Apache Solr,是Apache软件基金会的一个开源项目,主要用来处理全文搜索和企业级的搜索应用。它基于Java,利用Lucene库构建,提供了高效、可扩展的搜索和导航功能。Solr-9.0.0是该软件的最新版本,此...

    tomcat9 + solr

    标题 "Tomcat9 + Solr" 提示我们讨论的是如何在Apache Tomcat 9服务器上部署和运行Apache Solr搜索引擎。Solr是一个基于Java的开源全文搜索引擎,它提供了高效的索引和搜索功能,广泛用于企业级的信息检索系统。...

    ubuntu下solr7的ik分词及配置使用

    在Ubuntu系统上设置Apache Solr 7并使用IK分词器是提高中文文本搜索效率的重要步骤。Apache Solr是一款基于Lucene的全文检索服务器,它提供了高效、可扩展的搜索和分析功能。IK分词器(Intelligent Chinese Analyzer...

    php solr client demo

    标题 "php solr client demo" 暗示我们要探讨的是如何在PHP中使用Solr客户端进行集成和操作。Solr是Apache Lucene项目的一个开源搜索引擎,它提供了强大的全文搜索、文档处理、分布式处理等功能。PHP Solr客户端则...

    Solrj and Solr and LDAP and SearchEngine

    Solrj是Apache Solr的一个Java客户端库,用于与Solr服务器进行交互。它提供了丰富的API,使得开发人员可以方便地执行索引、查询、配置和管理Solr实例。Solrj简化了Solr的集成工作,例如在Java应用中添加或更新文档,...

    solr各种最近的jar包

    9. **日志和监控**:Solr使用Log4j进行日志记录,`log4j.jar`包含相关功能。同时,`solr-logging.jar`支持监控Solr服务器的状态和性能。 总之,这个压缩包提供了Solr运行所需的全套库,无论是开发、部署还是维护...

    Apache Solr(solr-8.11.1.zip)

    9. **安全与认证**:Solr 8.x引入了内置的安全性框架,包括Zookeeper的ACL和Solr的Role-Based Access Control (RBAC),为用户提供了一种保护Solr集群的方式。 10. **JMX监控**:Solr支持Java Management Extensions...

    Apache Solr(solr-8.11.1.tgz)

    Apache Solr 是一个开源的全文搜索引擎,由Apache软件基金会维护,是Lucene项目的一部分。它提供了高效、可扩展的搜索和导航功能,广泛应用于企业级的搜索应用中。Solr-8.11.1是该软件的一个特定版本,包含了最新的...

    SOLR的应用教程

    **SOLR应用教程** **一、概述** 1.1 企业搜索引擎方案选型 在为企业选择搜索引擎解决方案时,需要考虑的关键因素包括处理能力、可扩展性、易用性、性能以及对特定业务需求的支持。Solr作为一种开源的企业级搜索...

    Scaling Big Data with Hadoop and Solr

    Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some interesting real-world use cases and sample Java code.

    solr4.7服务搭建

    ### Solr 4.7 服务搭建详细指南 #### 一、环境准备 为了搭建 Solr 4.7 服务,我们需要确保以下环境已经准备好: 1. **Java Development Kit (JDK) 1.7**:Solr 需要 Java 运行环境支持,这里我们选择 JDK 1.7 ...

    最新版windows solr-8.8.2.zip

    1. **分布式搜索**:Solr支持在多台服务器上分布式部署,通过Sharding和Replication技术,能够处理海量数据,并实现快速的搜索响应。 2. **灵活的数据导入**:Solr提供了DataImportHandler(DIH),可以方便地从...

    zookeeper-solr集群

    Solr集群是Apache Solr的一种分布式部署方式,它允许用户在多台服务器上分布数据,以提高搜索性能和可用性。在本场景中,我们主要关注如何在Linux环境下搭建一个基于Zookeeper的SolrCloud集群。 首先,我们需要准备...

Global site tag (gtag.js) - Google Analytics