- 浏览: 2551350 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Solr(9)Solr Index Replication on Ubuntu and Scala Client
1. Create One More Core
Go to the example directory, copy the core directory
> pwd
/opt/solr/example
> cp -r collection1 jobs
> rm -fr collection1/
Go to the example directory, start the server
> java -jar start.jar
Go to the console web UI
http://ubuntu-master:8983/solr/#/
Add Core -
name: jobs
instanceDir: /opt/solr/example/solr/jobs
dataDir: /opt/solr/example/solr/jobs/data
config: /opt/solr/example/solr/jobs/conf/solrconfig.xml
schema: /opt/solr/example/solr/jobs/conf/schema.xml
Add one Record to the Solr System
Jobs —> Documents —> Request-Handler —>Document Type (Solr Command Raw XML or JSON)
<add>
<doc>
<field name=“id”>1</field>
<field name=“title”>senior software engineer</field>
</doc>
</add>
It is not working, maybe because of the commit issue. I tried with JSON, it works.
{
“id”:”1”,
“title”:”software engineer"
}
Click on the “Query” Tab, you will get all your data from there.
2. Set up the Replicate Server
> scp -r ubuntu-master:/opt/solr/example/solr/jobs ./
Check the master configuration, search for “/replication”, adding these configuration
<lst name="master">
<str name="enable">${master.enable:false}</str>
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">schema.xml,stopwords.txt</str>
</lst>
<lst name="slave">
<str name="enable">${slave.enable:false}</str>
<str name="masterUrl">${master.url:http://ubuntu-master:8983/solr/jobs}</str>
<str name="pollInterval">00:00:60</str>
<str name="httpConnTimeout">5000</str>
<str name="httpReadTimeout">10000</str>
</lst>
Also, change the auto commit time in the configuration.
<autoCommit>
<maxDocs>300000</maxDocs>
<!-- 5 minutes -->
<maxTime>300000</maxTime>
<openSearcher>true</openSearcher>
</autoCommit>
I will do the same thing on other solr configuration files on slaves. I have 2 slaves, ubuntu-dev1, ubuntu-dev2
Start the master with this command with the master option enabled
> java -Dmaster.enable=true -jar start.jar
Start the slaves on my slave servers.
> java -Dslave.enable=true -jar start.jar
From the Server Web UI Console, I can only see the replication is enabled.
http://ubuntu-master:8983/solr/#/jobs/replication
We can go to the slave console to check
http://ubuntu-dev1:8983/solr/#/jobs/query
Right now, I can add one more data on master and check if it gets indexed on the slaves.
Some console logging on slaves
529867 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Slave in sync with master.
589867 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Master's generation: 8
589868 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Slave's generation: 7
589869 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Starting replication process
589883 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Number of files in latest index in master: 52
After the process, we can search any latest data on slaves and masters.
3. Set up the Load Balance
I am running HA PROXY with the SOLR master, so I need to choose another port number, the configuration will be as follow:
listen solr_cluster 0.0.0.0:8984
acl master_methods method POST DELETE PUT
use_backend solr_master_backend if master_methods
default_backend solr_read_backends
backend solr_master_backend
server solr-master ubuntu-master:8983 check inter 5000 rise 2 fall 2
backend solr_read_backends
balance roundrobin
server solr-slave1 ubuntu-dev1:8983 check inter 5000 rise 2 fall 2
server solr-slave2 ubuntu-dev2:8983 check inter 5000 rise 2 fall 2
It is working well, we can check from here
http://ubuntu-master/haproxy-status
4. Build a Simple Client
https://github.com/takezoe/solr-scala-client
This class helps a lot. CaseClassMapper
package com.sillycat.jobsconsumer.persistence
import com.sillycat.jobsconsumer.models.Job
import com.sillycat.jobsconsumer.utilities.{IncludeConfig, IncludeLogger}
import jp.sf.amateras.solr.scala.SolrClient
import jp.sf.amateras.solr.scala.sample.Param
/**
* Created by carl on 8/6/15.
*/
object SolrClientDAO extends IncludeLogger with IncludeConfig{
private val solrClient = {
try {
logger.info("Init the SOLR Client ---------------")
val solrURL = config.getString(envStr("solr.url.jobs"))
logger.info("SOLR URL = " + solrURL)
val client = new SolrClient(solrURL)
client
} catch {
case x: Throwable =>
logger.error("Couldn't connect to SOLR: " + x)
null
}
}
def releaseResource = {
if(solrClient != null){
solrClient.shutdown()
}
}
def addJob(job:Job): Unit ={
//logger.debug("Adding job (" + job + ") to solr")
solrClient.add(job)
}
def query(query:String):Seq[Job] = {
logger.debug("Fetching the job results with query = " + query)
val result = solrClient.query(query).getResultAs[Job]()
result.documents
}
def commit = {
solrClient.commit()
}
}
The dependency will be as follow:
//for solr scala driver
resolvers += "amateras-repo" at "http://amateras.sourceforge.jp/mvn/"
"jp.sf.amateras.solr.scala" %% "solr-scala-client" % "0.0.12",
And the Test Class is as follow:
package com.sillycat.jobsconsumer.persistence
import com.sillycat.jobsconsumer.models.Job
import com.sillycat.jobsconsumer.utilities.IncludeConfig
import org.scalatest.{BeforeAndAfterAll, Matchers, FunSpec}
import redis.embedded.RedisServer
/**
* Created by carl on 8/7/15.
*/
class SolrDAOSpec extends FunSpec with Matchers with BeforeAndAfterAll with IncludeConfig{
override def beforeAll() {
if(config.getString("build.env").equals("test")){
}
}
override def afterAll() {
}
describe("SolrDAO") {
describe("#add and query"){
it("Add one single job to Solr") {
val expect = Job("id1","title1","desc1","industry1")
val num = 10000
val start = System.currentTimeMillis()
for ( i<- 1 to num){
val job = (Job("id" + i, "title" + i, "desc" + i, "industry" + i))
SolrClientDAO.addJob(job)
}
val end = System.currentTimeMillis()
println("total time for " + num + " is " + (end-start))
println("it is " + num / ((end-start)/1000) + " jobs/second")
// SolrDAO.commit
// val result = SolrDAO.query("title:title1")
// result should not be (null)
// result.size > 0 should be (true)
// result.foreach { item =>
// println(item.toString + "\n")
// }
}
}
}
}
Clean all the data during testing
http://ubuntu-master:8983/solr/jobs/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true
Actually the data schema is stored and defined in conf/schema.xml, I should update as follow:
<field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="desc" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="industry" type="text_general" indexed="true" stored="true" multiValued="false"/>
add single job at one time
total time for 10000 is 180096
it is 55 jobs/second
Find the log4j.properties here and change the log level
/opt/solr/example/resources/log4j.properties
I turned off the logging and used 2 threads on the clients, I get performance about below on each.
total time for 10000 is 51688
it is 196 jobs/second
The performance is as follow for single threads
total time for 10000 is 28398
it is 357 jobs/second
References:
Setup Scaling Servers
http://blog.csdn.net/thundersssss/article/details/5385699
http://lutaf.com/197.htm
http://blog.warningrc.com/2013/06/10/Solr-data-backup.html
Single mode on Jetty
http://sillycat.iteye.com/blog/2227398
load balance on the slaves
http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balancing-haproxy-and-drupal
https://gist.github.com/feniix/1974460
http://stackoverflow.com/questions/10090386/how-to-check-solr-healthy-using-haproxy
solr clients
https://github.com/takezoe/solr-scala-client
https://wiki.apache.org/solr/Solrj
1. Create One More Core
Go to the example directory, copy the core directory
> pwd
/opt/solr/example
> cp -r collection1 jobs
> rm -fr collection1/
Go to the example directory, start the server
> java -jar start.jar
Go to the console web UI
http://ubuntu-master:8983/solr/#/
Add Core -
name: jobs
instanceDir: /opt/solr/example/solr/jobs
dataDir: /opt/solr/example/solr/jobs/data
config: /opt/solr/example/solr/jobs/conf/solrconfig.xml
schema: /opt/solr/example/solr/jobs/conf/schema.xml
Add one Record to the Solr System
Jobs —> Documents —> Request-Handler —>Document Type (Solr Command Raw XML or JSON)
<add>
<doc>
<field name=“id”>1</field>
<field name=“title”>senior software engineer</field>
</doc>
</add>
It is not working, maybe because of the commit issue. I tried with JSON, it works.
{
“id”:”1”,
“title”:”software engineer"
}
Click on the “Query” Tab, you will get all your data from there.
2. Set up the Replicate Server
> scp -r ubuntu-master:/opt/solr/example/solr/jobs ./
Check the master configuration, search for “/replication”, adding these configuration
<lst name="master">
<str name="enable">${master.enable:false}</str>
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">schema.xml,stopwords.txt</str>
</lst>
<lst name="slave">
<str name="enable">${slave.enable:false}</str>
<str name="masterUrl">${master.url:http://ubuntu-master:8983/solr/jobs}</str>
<str name="pollInterval">00:00:60</str>
<str name="httpConnTimeout">5000</str>
<str name="httpReadTimeout">10000</str>
</lst>
Also, change the auto commit time in the configuration.
<autoCommit>
<maxDocs>300000</maxDocs>
<!-- 5 minutes -->
<maxTime>300000</maxTime>
<openSearcher>true</openSearcher>
</autoCommit>
I will do the same thing on other solr configuration files on slaves. I have 2 slaves, ubuntu-dev1, ubuntu-dev2
Start the master with this command with the master option enabled
> java -Dmaster.enable=true -jar start.jar
Start the slaves on my slave servers.
> java -Dslave.enable=true -jar start.jar
From the Server Web UI Console, I can only see the replication is enabled.
http://ubuntu-master:8983/solr/#/jobs/replication
We can go to the slave console to check
http://ubuntu-dev1:8983/solr/#/jobs/query
Right now, I can add one more data on master and check if it gets indexed on the slaves.
Some console logging on slaves
529867 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Slave in sync with master.
589867 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Master's generation: 8
589868 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Slave's generation: 7
589869 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Starting replication process
589883 [snapPuller-10-thread-1] INFO org.apache.solr.handler.SnapPuller – Number of files in latest index in master: 52
After the process, we can search any latest data on slaves and masters.
3. Set up the Load Balance
I am running HA PROXY with the SOLR master, so I need to choose another port number, the configuration will be as follow:
listen solr_cluster 0.0.0.0:8984
acl master_methods method POST DELETE PUT
use_backend solr_master_backend if master_methods
default_backend solr_read_backends
backend solr_master_backend
server solr-master ubuntu-master:8983 check inter 5000 rise 2 fall 2
backend solr_read_backends
balance roundrobin
server solr-slave1 ubuntu-dev1:8983 check inter 5000 rise 2 fall 2
server solr-slave2 ubuntu-dev2:8983 check inter 5000 rise 2 fall 2
It is working well, we can check from here
http://ubuntu-master/haproxy-status
4. Build a Simple Client
https://github.com/takezoe/solr-scala-client
This class helps a lot. CaseClassMapper
package com.sillycat.jobsconsumer.persistence
import com.sillycat.jobsconsumer.models.Job
import com.sillycat.jobsconsumer.utilities.{IncludeConfig, IncludeLogger}
import jp.sf.amateras.solr.scala.SolrClient
import jp.sf.amateras.solr.scala.sample.Param
/**
* Created by carl on 8/6/15.
*/
object SolrClientDAO extends IncludeLogger with IncludeConfig{
private val solrClient = {
try {
logger.info("Init the SOLR Client ---------------")
val solrURL = config.getString(envStr("solr.url.jobs"))
logger.info("SOLR URL = " + solrURL)
val client = new SolrClient(solrURL)
client
} catch {
case x: Throwable =>
logger.error("Couldn't connect to SOLR: " + x)
null
}
}
def releaseResource = {
if(solrClient != null){
solrClient.shutdown()
}
}
def addJob(job:Job): Unit ={
//logger.debug("Adding job (" + job + ") to solr")
solrClient.add(job)
}
def query(query:String):Seq[Job] = {
logger.debug("Fetching the job results with query = " + query)
val result = solrClient.query(query).getResultAs[Job]()
result.documents
}
def commit = {
solrClient.commit()
}
}
The dependency will be as follow:
//for solr scala driver
resolvers += "amateras-repo" at "http://amateras.sourceforge.jp/mvn/"
"jp.sf.amateras.solr.scala" %% "solr-scala-client" % "0.0.12",
And the Test Class is as follow:
package com.sillycat.jobsconsumer.persistence
import com.sillycat.jobsconsumer.models.Job
import com.sillycat.jobsconsumer.utilities.IncludeConfig
import org.scalatest.{BeforeAndAfterAll, Matchers, FunSpec}
import redis.embedded.RedisServer
/**
* Created by carl on 8/7/15.
*/
class SolrDAOSpec extends FunSpec with Matchers with BeforeAndAfterAll with IncludeConfig{
override def beforeAll() {
if(config.getString("build.env").equals("test")){
}
}
override def afterAll() {
}
describe("SolrDAO") {
describe("#add and query"){
it("Add one single job to Solr") {
val expect = Job("id1","title1","desc1","industry1")
val num = 10000
val start = System.currentTimeMillis()
for ( i<- 1 to num){
val job = (Job("id" + i, "title" + i, "desc" + i, "industry" + i))
SolrClientDAO.addJob(job)
}
val end = System.currentTimeMillis()
println("total time for " + num + " is " + (end-start))
println("it is " + num / ((end-start)/1000) + " jobs/second")
// SolrDAO.commit
// val result = SolrDAO.query("title:title1")
// result should not be (null)
// result.size > 0 should be (true)
// result.foreach { item =>
// println(item.toString + "\n")
// }
}
}
}
}
Clean all the data during testing
http://ubuntu-master:8983/solr/jobs/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true
Actually the data schema is stored and defined in conf/schema.xml, I should update as follow:
<field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="desc" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="industry" type="text_general" indexed="true" stored="true" multiValued="false"/>
add single job at one time
total time for 10000 is 180096
it is 55 jobs/second
Find the log4j.properties here and change the log level
/opt/solr/example/resources/log4j.properties
I turned off the logging and used 2 threads on the clients, I get performance about below on each.
total time for 10000 is 51688
it is 196 jobs/second
The performance is as follow for single threads
total time for 10000 is 28398
it is 357 jobs/second
References:
Setup Scaling Servers
http://blog.csdn.net/thundersssss/article/details/5385699
http://lutaf.com/197.htm
http://blog.warningrc.com/2013/06/10/Solr-data-backup.html
Single mode on Jetty
http://sillycat.iteye.com/blog/2227398
load balance on the slaves
http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balancing-haproxy-and-drupal
https://gist.github.com/feniix/1974460
http://stackoverflow.com/questions/10090386/how-to-check-solr-healthy-using-haproxy
solr clients
https://github.com/takezoe/solr-scala-client
https://wiki.apache.org/solr/Solrj
发表评论
-
Update Site will come soon
2021-06-02 04:10 1677I am still keep notes my tech n ... -
Stop Update Here
2020-04-28 09:00 315I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 475NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 367Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 368Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 336Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 430Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 435Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 373Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 454VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 384Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 477NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 422Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 337Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 246GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 450GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 326GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 313Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 317Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 292Serverless with NodeJS and Tenc ...
相关推荐
文档中详细介绍了如何在ubuntu下面安装solr-4.9.0,以及在安装过程中出现的问题和解决办法
而当我们使用Scala进行开发时,`solr-scala-client`就是专门为Scala开发者设计的Solr客户端库,它提供了一套简洁、强大的API,用于在Scala应用中与Solr进行交互。 **一、solr-scala-client简介** `solr-scala-...
### Ubuntu16.04上Solr7的安装与部署详解 #### 一、环境准备与配置 在开始部署Solr7之前,首先确保已经具备了以下基础环境: 1. **虚拟机环境**:本教程使用的是VMware12.0作为虚拟机平台,操作系统为Ubuntu16.04...
The book starts off by explaining the fundamentals of Solr and then goes on to cover various topics such as data indexing, ways of extending Solr, client APIs and their indexing and data searching ...
Solr是Apache软件基金会的一个开源项目,它是基于Java的全文搜索服务器,被广泛应用于企业级搜索引擎的构建。源码分析是深入理解一个软件系统工作原理的重要途径,对于Solr这样的复杂系统尤其如此。这里我们将围绕...
Solr,全称为Apache Solr,是Apache软件基金会的一个开源项目,主要用来处理全文搜索和企业级的搜索应用。它基于Java,利用Lucene库构建,提供了高效、可扩展的搜索和导航功能。Solr-9.0.0是该软件的最新版本,此...
标题 "Tomcat9 + Solr" 提示我们讨论的是如何在Apache Tomcat 9服务器上部署和运行Apache Solr搜索引擎。Solr是一个基于Java的开源全文搜索引擎,它提供了高效的索引和搜索功能,广泛用于企业级的信息检索系统。...
在Ubuntu系统上设置Apache Solr 7并使用IK分词器是提高中文文本搜索效率的重要步骤。Apache Solr是一款基于Lucene的全文检索服务器,它提供了高效、可扩展的搜索和分析功能。IK分词器(Intelligent Chinese Analyzer...
标题 "php solr client demo" 暗示我们要探讨的是如何在PHP中使用Solr客户端进行集成和操作。Solr是Apache Lucene项目的一个开源搜索引擎,它提供了强大的全文搜索、文档处理、分布式处理等功能。PHP Solr客户端则...
Solrj是Apache Solr的一个Java客户端库,用于与Solr服务器进行交互。它提供了丰富的API,使得开发人员可以方便地执行索引、查询、配置和管理Solr实例。Solrj简化了Solr的集成工作,例如在Java应用中添加或更新文档,...
9. **日志和监控**:Solr使用Log4j进行日志记录,`log4j.jar`包含相关功能。同时,`solr-logging.jar`支持监控Solr服务器的状态和性能。 总之,这个压缩包提供了Solr运行所需的全套库,无论是开发、部署还是维护...
9. **安全与认证**:Solr 8.x引入了内置的安全性框架,包括Zookeeper的ACL和Solr的Role-Based Access Control (RBAC),为用户提供了一种保护Solr集群的方式。 10. **JMX监控**:Solr支持Java Management Extensions...
Apache Solr 是一个开源的全文搜索引擎,由Apache软件基金会维护,是Lucene项目的一部分。它提供了高效、可扩展的搜索和导航功能,广泛应用于企业级的搜索应用中。Solr-8.11.1是该软件的一个特定版本,包含了最新的...
**SOLR应用教程** **一、概述** 1.1 企业搜索引擎方案选型 在为企业选择搜索引擎解决方案时,需要考虑的关键因素包括处理能力、可扩展性、易用性、性能以及对特定业务需求的支持。Solr作为一种开源的企业级搜索...
Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some interesting real-world use cases and sample Java code.
### Solr 4.7 服务搭建详细指南 #### 一、环境准备 为了搭建 Solr 4.7 服务,我们需要确保以下环境已经准备好: 1. **Java Development Kit (JDK) 1.7**:Solr 需要 Java 运行环境支持,这里我们选择 JDK 1.7 ...
1. **分布式搜索**:Solr支持在多台服务器上分布式部署,通过Sharding和Replication技术,能够处理海量数据,并实现快速的搜索响应。 2. **灵活的数据导入**:Solr提供了DataImportHandler(DIH),可以方便地从...
Solr集群是Apache Solr的一种分布式部署方式,它允许用户在多台服务器上分布数据,以提高搜索性能和可用性。在本场景中,我们主要关注如何在Linux环境下搭建一个基于Zookeeper的SolrCloud集群。 首先,我们需要准备...