`
kavy
  • 浏览: 888134 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Solr performance tuning

 
阅读更多

http://h3x.no/2011/05/10/guide-solr-performance-tuning

Introduction

I have for the last year been working a lot with the Solr search engine, and figuring out how to get the best performance from a Solr instance.  And it is almost funny how much impact the little things have on search performance. I have in this article described the points i have noticed myself that can be worked with in order to get just a little more juice out of Solr. (If you are working on tuning Solr yourself, remember to also look at the Solr Wiki for some extra hints)

 

Test data

For this article, i will be using the Wikipedia database as test content. I have downloaded a version from Wikimedia that only contain the current version of all english articles. (download link)

I have generated XML-documents out of 633654 pages, to get a descent amount of test data. (This has given me a Solr index of 6.7 GB)  And then collected 25000 random words from those pages which i will use to run tests on. I will run the searches 5 times each, reaching to a total of 125 000 search queries against Solr. Solr will be restarted before each test to ensure correct cache levels (etc) for each test. When tests are performed on the same disk device, the OS disk cache is cleared from RAM in order to get a correct test each time. (sync;echo 3 > /proc/sys/vm/drop_caches)

The search queries will be simple search terms without any wildcards etc, since i will discuss this and usage of ngrams in Solr more in detail in a later article.

My test server

My test server is a Ubuntu Linux machine running with a single SATA drive for OS, and a RAID6 spanning over 11 SATA disks. The server started out with a single dual core FX-62 AMD processor with 8 GB of RAM, this was later replaced with a quad core Q9300 CPU with 16 GB of RAM.  Changes in the hardware is described in the hardware chapter, to give you a focus on how solr actually responds on relative small hardware changes.

My schema.xml

The test schema uses all the default field types, and had the following data fields.

1
2
3
4
5
6
             <fields>   
                        <field name="id" type="string" indexed="true" stored="true" required="true" />
                        <field name="title" type="text" indexed="true" stored="true" multiValued="true"/>
                        <field name="text" type="text" indexed="true" stored="true" multiValued="true"/>
                        <field name="text_rev" type="text_rev" indexed="true" stored="false" multiValued="true"/>
             </fields>

Tuning part one: Hardware

Starting out
I started solr on a single drive at first, without tweaking anything. The time taken to run all the queries was 168m43.705s. A quick calculation gives 12 queries per second for that.  I suspect that it is possible to speed up that a lot. So lets try to move our solr instance over to our RAID setup.

RAID performance
I moved the Solr instance over to the RAID set and performed the same test again. And this actually caused a performance gain of just over 100%.  The whole test now did not take more than 82m51.199s, which equals no less than 25 queries per second. But no need to stop here, time to try some hardware upgrade before turning to software tweaking.

Time for more juice!
Time to try a last change in the hardware. I replaced the dual core AMD CPU and its 8GB of RAM with a quad core Intel CPU with 16GB of RAM.  This time the test did not take more than 20m41.202s, a massive improvement from the first 168 minutes.  And we have actually now reached 100 queries per second, and that is even before tuning Solr itself.

Tuning part two: Tuning the solr cache (in solrconfig.xml)

I have now tuned as much as i have the posibility to do with the hardware i have available, so the next step is to look to our solrconfig.xml, which has not been touched yet. I will focus on the caches that you can track via the statistics page of solr admin (http://solrserver:8983/solr/admin/stats.jsp#cache).  The different cache elements there will have information about its size, elements inserted and elements removed in order to make room for new elements. If you have many evictions you should look into increasing that cache module so all elements can fit (but dont overdo it, adjust it and see what fits for your setup). It is likewise also a idea to decrease the size of some of them if they have a lot of unused slots.  A goal should be to get the hit rate as close to 1.00 as possible (1.00 beeing 100% hit ratio)

For my setup with simple search queries and no usage of filters, i only have two cache modules that i need to adjust, that is queryResultCache and documentCache.

queryResultCache
The queryResultCache is used to store ordered sets of document IDs. After running the test suite it noticed that the number of evictions already reached several thousands. So i start by adjusting it to 122880 (both size and initalSize), quite an increase from the default 512.  This cache does not have many lookups compared to inserts, but it still caused the test suite to go down to 17m17.550s. (120 queries per second)

documentCache
The document cache has had over 2800000 cache inserts, with a default of only 512 slots, that wont do for long. So i increased this from 512 to 2900000. This caused the test suite to go down a bit more to 16m15.414s (128 queries per second)

Other
Solr does have a couple of other cache settings too you can tweak, but these are dependet on your setup and solr usage. See http://wiki.apache.org/solr/SolrCaching# for more information.

Tuning part three: Java parameters
There is a lot of settings which you can tune for optimizing java, i will not go in dept on them here. But i would like to point out that one of the most important parameters to tweak is how much memory Java can use. If you use to little then Java have to work hard making sure it has enoug memory to use, while too much again causes Java to hog memory that could rather be used for disk caching.

I have done a couple of tests to display how different memory settings will affect the search suite i have:

“-Xmx14336m -Xms4096m” (14G/4G)
My test suite was down to 16 minutes, after giving Java too much memory the test went up to 26m50.914s. I suspect the reason for this beeing that Java hogged so much memory that the OS could no longer keep the index data in cache, causing more disk access.

“-Xmx2048m -Xms512m”(2G/512M)
I aborted the test after running for a staggering 1331m33.322s. I suspect that after the test had ran for a while, java/Solr had to use soo much resources on keeping enough memory available that it eventually died/hanged.

Skipping memory settings
I then tried to let Solr run without any limitation to memory (aka let Java decide for itself at startup, based on memory available on the machine)
This caused it to use around 4G of RAM after running for a while, quite a bit more than it had to spend in the previous test. This did of course do wonders for the response from Solr, sending it back to 16m26.850s.

In order to keep a bit control over your server, i suggest running without a limit first, then set a limit when you have had Solr running for a while and can see for yourself how much Solr wants to use.

Tuning part four: Tuning the search queries

If you have a schema with multiple fields which you can filter when doing queries, then use them! (With the help of the filterquery(fq) parameter) If solr has the possibility to remove X % of the documents before searching the remaining documents, you can risk a pretty good performance boost for those queries.

In conclution:

The numbers
Running test suite against default solr instance on single drive with 8G RAM and 2 cores: 168m43.705s
Running test suite against default solr instance on RAID6 with 8G RAM and 2 cores:  82m51.199s
Running test suite against default solr instance on RAID6 with 16G RAM and 4 cores: 20m41.202s
Running test suite with tuned solr cache on the same hardware: 16m15.414s

And finally
As you can see it is rather easy to get either a very bad performance or a very good performance from Solr, it all depends on your setup and what your needs are. You have to analyze and test to see what setup is the best for your needs, since there are no simple answers that fits every need.

Hardware will give you a lot of the needed performance, but if you have reconfigured something wrong, all the hardware in the world wont help. (It wont help with 32G of RAM on a 4GB index and when Solr only can use 512MB…)

This entry was posted in Software and tagged performanceSolrtuning by ueland. Bookmark thepermalink.

9 THOUGHTS ON “GUIDE: SOLR PERFORMANCE TUNING

Here are some that come to mind right now that are very useful:

- Be smart about your commit strategy if you're indexing a lot of documents (commitWithin is great). Use batches too.

- Many times, i've seen Solr index documents faster than the database could create them (considering joins, denormalizing, etc). Cache these somewhere so you don't have to recreate the ones that haven't changed.

- Set up and use the Solr caches properly. Think about what you want to warm and when. Take advantage of the Filter Queries and their cache! It will improve performance quite a bit.

- Don't store what you don't need for search. I personally only use Solr to return IDs of the data. I can usually pull that up easily in batch from the DB / KV store. Beats having to reindex data that was just for show anyway...

- Solr (Lucene really) is memory greedy and picky about the GC type. Make sure that you're sorted out in that respect and you'll enjoy good stability and consistent speed.

- Shards are useful for large datasets, but test first. Some query features aren't available in a sharded environment (YMMV).

 

- Solr is improving quickly and v4 should include some nice cloud functionality (zookeeper ftw).

 

 

  1. We’ve been testing out solr recently for a new project and I’m interested in the Java memory settings you commented about in the article (I’m a newbie when it comes to the java config side of things). Would you mind explaining how you configure the different settings for solr, and also how you “skip” the memory settings? Are you just launching solr with something like:

    java -jar start.jar

    Thanks. Great article!

     
  2. Hello,

    Regarding memory settings:
    That is correct, by only doing “java -jar start.jar” you have skipped memory settings and Java will then for itself set up memory limits. But if you have a large enough index, that will quickly lead to memory problems, like java exiting with outOfMemoryExceptions.

    As a starter, the only settings you should worry about in Solr (solr-config.xml) is queryResultCache and documentCache. Try increasing them and see how your Solr setup handles different values. I will be carefull with saying how much they should be set at since that will be different for every Solr setup in the world :)

    Tor

     
  3. Phil on October 26, 2011 at 17:17 said:

    Do you know any best practices for load balancing solr servers? Currently there is a master and slave inside a VIP, however one server frequently is getting:

    java.lang.IllegalStateException: Error performing SOLR search org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset

    randomly, maybe every 15 minutes or so when hitting the VIP.

    Thanks

     
  4. I have not been working with load balancing of Solr yet. But could the error be caused by the master/slave having too much traffic and therefore giving up? If the issue is on Solr you should consider either adding more Solr nodes or upgrading the hardware to make it more scalable. (Having more RAM than the index size for example is a great start)

    If the issue is the load balancer itself i would reccomend looking into hardware load balancers since they are custom built for only load balancing and little more.

    Hope that it helps a bit :)

     
  5. Dominique on August 17, 2012 at 11:32 said:

    Great article.
    To late, i assume to have the output of the command
    java -XX:+PrintFlagsFinal -Xmx2g -version 2>&1 | grep -i -E ‘heapsize|permsize|version’

    In order to see how java use the memory by itself on your server.

    Are you using the “-server” option ?

     
  6. ueland on August 17, 2012 at 21:13 said:

    Hi!

    Unfortunately, yes :)

    i do belive that the server flag is default on, on my setup. But the server in question has been reinstalled, so i cannot say for sure.

     
  7. Pingback: Guide: Solr performance tuning | H3x.no – Tor Henning Uelands blog « Ram Prasad

  8. shrikant on February 20, 2014 at 18:20 said:

    Hi
    I am facing a issue in solr.
    I have 17 lakh records indexed in solr which have 4.5 gb indexed size.
    And I have quad core CPU with 8 GB of RAM, when i search for a word in solr then it takes lots of time to execute. The breakdown as follows

    This word have 2 lakh records in the solr
    when i put limit of 10000 records, it takes 25 seconds to execute.
    100000 records, it takes 250 seconds to execute.
    200000 records, it takes 500 seconds to execute which is approximate 8 minutes.
    Taking 8 minutes for getting only 2 lakh records is useless.

    And I have read your article and could not solve the problem in this case.

    Could you please let me know where is the problem.

     
    • uelandon February 23, 2014 at 22:01 said:

      Some possible solutions:
      1: You ask for too many rows i a resultset, so it cannot be cached too easily
      2: The index is not cached in RAM, the OS will do this after a while, this requires enough RAM, if the server/machine does more than just Solr, more RAM is needed
      3: Solr needs more RAM for caching internally
      4: You need a faster drive.

分享到:
评论

相关推荐

    Apache Solr High Performance.pdf&Solr;+In+Action+2013.pdf英文版

    这两本电子书——"Apache Solr High Performance.pdf" 和 "Solr In Action 2013.pdf" 提供了深入的Solr知识,帮助读者理解和优化Solr的性能。 "Apache Solr High Performance"可能涵盖了如何最大化Solr的性能,包括...

    《Apache Solr High Performance》电子书

    《Apache Solr High Performance》这本书主要探讨如何提高Solr实例的性能,并解决实时问题。该书详细介绍了性能优化的各个层面,包括索引优化、查询优化、硬件和系统配置以及对Solr进行故障排除和监控。本书是针对...

    Packt Publishing Apache Solr High Performance (2014)

    Packt Publishing Apache Solr High Performance (2014)

    Apache Solr(solr-8.11.1.zip)

    Apache Solr是一款开源的企业级搜索平台,由Apache软件基金会维护。它是基于Java的,提供了高效、可扩展的全文检索、数据分析和分布式搜索功能。Solr-8.11.1是该软件的一个特定版本,包含了从早期版本到8.11.1的所有...

    solr4.7服务搭建

    ### Solr 4.7 服务搭建详细指南 #### 一、环境准备 为了搭建 Solr 4.7 服务,我们需要确保以下环境已经准备好: 1. **Java Development Kit (JDK) 1.7**:Solr 需要 Java 运行环境支持,这里我们选择 JDK 1.7 ...

    solr(solr-9.0.0.tgz)

    Solr,全称为Apache Solr,是Apache软件基金会的一个开源项目,主要用来处理全文搜索和企业级的搜索应用。它基于Java,利用Lucene库构建,提供了高效、可扩展的搜索和导航功能。Solr-9.0.0是该软件的最新版本,此...

    Apache Solr(solr-8.11.1.tgz)

    Apache Solr 是一个开源的全文搜索引擎,由Apache软件基金会维护,是Lucene项目的一部分。它提供了高效、可扩展的搜索和导航功能,广泛应用于企业级的搜索应用中。Solr-8.11.1是该软件的一个特定版本,包含了最新的...

    solr.war包solr.war包solr.war包solr.war包solr.war包

    solr.warsolr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包solr.war包...

    solr-6.2.0源码

    Solr是Apache软件基金会开发的一款开源全文搜索引擎,它基于Java平台,是Lucene的一个扩展,提供了更为方便和强大的搜索功能。在Solr 6.2.0版本中,这个强大的分布式搜索引擎引入了许多新特性和改进,使其在处理大...

    solr(solr-9.0.0-src.tgz)源码

    Solr是Apache软件基金会的一个开源项目,它是基于Java的全文搜索服务器,被广泛应用于企业级搜索引擎的构建。源码分析是深入理解一个软件系统工作原理的重要途径,对于Solr这样的复杂系统尤其如此。这里我们将围绕...

    solr服务器_solr_

    Solr服务器是Apache Lucene项目的一个子项目,是一款开源的企业级搜索平台,专门用于处理大量文本数据的全文检索、搜索和分析。它基于Java开发,能够处理多种数据源,包括XML、JSON、CSV等,提供了高效、可扩展的...

    解决solr启动404问题

    Solr是Apache Lucene项目的一个子项目,是一个高性能、基于Java的企业级全文搜索引擎服务器。当你在尝试启动Solr时遇到404错误,这通常意味着Solr服务没有正确地启动或者配置文件设置不正确。404错误表示“未找到”...

    solr的学习

    ### Solr 学习知识点详解 #### 一、Solr 概述 - **定义**:Solr 是 Apache 下的一个顶级开源项目,采用 Java 开发,它是基于 Lucene 的全文搜索服务器。Solr 可以独立运行在 Jetty、Tomcat 等 Servlet 容器中。 -...

    solr-7.4.0.zip

    Solr,全称为Apache Solr,是一款开源的企业级全文搜索引擎,由Apache软件基金会开发并维护。它是基于Java的,因此在使用Solr之前,确保你的系统已经安装了Java 8或更高版本是至关重要的。标题"solr-7.4.0.zip"表明...

    solr增量更新架包apache-solr-dataimportscheduler.jar

    Apache Solr 是一个开源的全文搜索引擎,广泛应用于各种企业级数据搜索和分析场景。增量更新是Solr的一个关键特性,它允许系统仅处理自上次完整索引以来发生更改的数据,从而提高了性能并降低了资源消耗。"apache-...

    Linux上Solr的启动方式

    使用Solr内置的Jetty服务器启动Solr (1)借助X Shell上传solr的安装包到/usr/local/目录下,使用 tar -zxvf命令进行解压.  (2)使用内置的Jetty来启动Solr服务器只需要在example目录下,执行start.jar程序即可,...

    solr-4.4.0.tgz

    Solr 是一个开源的全文搜索引擎,由 Apache 软件基金会开发。版本 4.4.0 是 Solr 的一个重要里程碑,它包含了丰富的特性和改进。这个“solr-4.4.0.tgz”文件是一个针对 Linux 系统的压缩包,用于在服务器上部署 Solr...

    ikanalyzer-solr8.4.0_solr8_solr_ikanalyzer_中文分词_

    Solr8.4.0 是 Apache Solr 的一个版本,这是一个高度可配置、高性能的全文搜索和分析引擎,广泛用于构建企业级搜索应用。 在 Solr 中,ikanalyzer 是一个重要的组件,它通过自定义Analyzer来实现中文的分词处理。...

    solr-dataimport-scheduler.jar 可使用于solr7.x版本

    Solr 数据导入调度器(solr-dataimport-scheduler.jar)是一个专门为Apache Solr 7.x版本设计的组件,用于实现数据的定期索引更新。在理解这个知识点之前,我们需要先了解Solr的基本概念以及数据导入处理...

Global site tag (gtag.js) - Google Analytics