`

act_as_solr

    博客分类:
  • RoR
阅读更多

原文出处:http://www.quarkruby.com/2007/8/12/acts_as_solr-for-search-and-faceting

Table of Contents

For more acts_as_solr magic:


Solr and acts_as_solr

Solr is a search server based on lucene java search library with a HTTP/XML interface. Using Solr, large collections of documents can be indexed based on strongly typed field definitions, thereby taking advantage of Lucene's powerful full-text search features. is a ruby on rails plugin adding Solr capabilities to activerecord models. It hides all configuration and manual setting efforts with Solr and provides you with simple find_by... methods. acts_as_solr can be used as a replacement to acts_as_ferret because of inbuilt full text search capabilities ;-) . The purpose of this article is to explain acts_as_solr with examples.

acts_as_solr


Getting Started

Installation: Installation is well explained on acts_as_solr homepage and getting started with acts_as_solr

Note: acts_as_solr requires jre1.5 on system. Before running any of the solr methods make sure you start solr server with rake solr:start command.

Our example model for this tutorial will be DigitalCamera [classname: Camera] with following fields

  • name (type:string)
  • brand (type:string) [we want faceted browsing on this field]
  • resolution (type:float) [we want faceted browsing on this field]
  • other fields which we do not want to index

 


Basic Usage : for search
Lets start with basic search and then we will move on to faceted browsing. You need to specify which of the columns from your model file you want to be indexed for search (if no :fields param is given, then all columns are indexed) :

1
2
3
class Camera
  acts_as_solr :fields=>[:name,:brand,:resolution]
end
Basic search can be done using:

@results = Camera.find_by_solr("canon powershot")
More options can be passed in a hash (2nd argument). Available Options are [:limit, :offset, :scores, :order, :operator]

  • limit: to limit the count of search results
  • offset: starting index of the search results.
  • scores: solr scores for each of the result returned
  • order: by which field results should be ordered and in asc/desc order?
  • operator: words in query should be separated by "and" or "or" i.e. all words should exist in matching results or any of the words respectively

Note:

You can get model objects and count of results by

1
2
@products = @results.docs
@total_hits = @results.total_hits

Pagination
We will be using
will_paginate for pagination (rails default pagination is buggy). One way of paginating returned results is explained nicely at Using will paginate with acts_as_solr. But we don't need an additional module for getting count of results returned. This module is an overhead because of extra Solr query. acts_as_solr returns the total number of results via find_by_solr and you don't need to call count_by_solr separately for getting result count. In your .rhtml file include this where you want pagination
1
2
<%= will_paginate WillPaginate::Collection.new((params[:page]||1),
                                                         (params[:products_per_page]||10),@total_hits) -%>

Faceting

Presentation by Keith Instone is an excellent read on faceted browsing with examples.

Back to acts_as_solr ... open the model file. Using option :facets => ... add the columns on which you want to allow faceted browsing.

1
2
3
4
class Camera
  acts_as_solr :fields=>[:name,:brand,:resolution], 
               :facets=>[:brand,:resolution]
end
By default all fields are considered to be of type string. But resolution facet is of type float and we would like to have range queries on it (like find all cameras having resolution between 5 and 7). Modify the model to look like:
1
2
3
4
class Camera
  acts_as_solr :fields=>[:name,:brand,{:resolution=>:range_float}], 
                :facets=>[:brand,:resolution]
end
When a user searches anything (lets say "powershot"), the values for other facets should get updated. Here is, how you can get updated facet values:
1
2
@results = Camera.find_by_solr("powershot",{:facets=>
                                            {:zeros=>false,:fields=>[:brand]}})
Here,

  • :zeros parameter tells solr not to return the brand values whose facet count is zero.
  • @results.facets["facet_fields"]["brand_facet"] contains the names of brand with the corresponding counts. A sample result:
    
    
    {"Canon USA"=>1, "Canon PowerShot"=>1, "Canon"=>91}

But what about values for resolution facet as we have defined resolution to be varying float type (float_range)? Something like this does not work:

1
2
3
# INCORRECT
@results = Camera.find_by_solr("powershot",{:facets=>
                                            {:zeros=>false,:fields=>[:brand,:resolution]}}) 
We need to use facet_query param of find_by_solr with ranges for resolution predefined i.e. Solr cannot calculate the range for the float/int fields itself and we need to specify the range of values while querying solr. For example if values in resolution column range from 0 to 20 and we want to have 4 facet ranges. Your query would be something like
1
2
3
4
5
6
7
@results = Camera.find_by_solr("powershot",
                             {:facets=>{:zeros=>false,
                                        :fields=>[:brand],
                                        :query=>["resolution:[0 TO 4]","resolution:[5 TO 9]",
                                                    "resolution:[10 TO 14]","resolution:[15 TO 20]"]
                                              }
                              })
Resolution facet ranges can be obtained from output by @results.facets["facet_queries"]
If you want to display the results to the user as links along with counts (something like example images at the top), where user can make a further selection, you need to get results from solr using
1
2
3
@results = Camera.find_by_solr("powershot"+" AND resolution:[0 TO 4]",
                         {:facets=>{:zeros=>false, :fields=>[:brand]}}) 
## resolution queries has been removed since you already made a selection in resolution itself.
User wants multiple selections in resolution i.e. between 0 to 4 and between 6 to 7. So we can define query (first argument to find_by_solr) as "powershot" + " AND (resolution:[0 TO 4] OR resolution:[6 TO 7])".
But unfortunately, find_by_solr(query+" AND brand:Canon") doesn't seems to work
By default all fields are of type string and faceting for these fields is done using browse parameter i.e.
1
2
3
4
5
6
@results = Camera.find_by_solr(query,{:facets=>
                                  {:zeros=>false,
                                    :query=>["resolution:[0 TO 4]", "resolution:[5 TO 9]",
                                            "resolution:[10 TO 14]", "resolution:[15 TO 20]"],
                                    :browse=>["brand:Canon"]}
                                  })
and this is fine until you want to allow multiple selections for brand i.e. get products from canon and sony.
Note: I don't know how to make this work with browse field because multiple options in browse are separated by AND and :operator option from find_by_solr works only with words in query.
A good alternative is to redefine your model fields which are of string type to field_type as :facet. Our acts_as_solr declaration becomes:
1
2
acts_as_solr :fields=>[:name,{:brand=>:facet}, {:resolution=>:range_float}], 
           :facets=>[:brand,:resolution]
This allow us to use brand in query itself like resolution making multiple options to be separated by OR as above. For ex. new query becomes: "powershot" + " AND (brand:Canon OR brand:Sony)"

Boosting
Using boost option you can give one field priority over the others. Just a small change in acts_as_solr declaration is enough.

acts_as_solr :fields => [{:name=>{:boost=>2}},:brand,:resolution]

Quick Tips


Testing Solr
Copy vendor/plugins/acts_as_solr/test/test_helper.rb modified as shown below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Test::Unit::TestCase
  begin
    Net::HTTP.get_response(URI.parse('http://localhost:8981/solr/'))
  rescue Errno::ECONNREFUSED
    raise "You forgot to 'rake solr:start RAILS_ENV=test', foo!"
  end 

  def self.fixtures(*table_names)
    if block_given?
      Fixtures.create_fixtures(Test::Unit::TestCase.fixture_path, table_names) { yield }
    else
      Fixtures.create_fixtures(Test::Unit:: TestCase.fixture_path, table_names)
    end
    table_names.each do |table_name|
      clear_from_solr(table_name)
      klass = instance_eval table_name.to_s.capitalize.singularize.camelize
      klass.find (:all).each{|content| content.solr_save}
    end
  end

  private
  def self.clear_from_solr(table_name)
    ActsAsSolr::Post.execute(Solr::Request::Delete.new(:query => "type_t:#{table_name.to_s.capitalize.singularize.camelize}")
  end
end

Modifications in the original file

  1. removed top lines
  2. added camelize for removing "_" from model names
  3. added lines to test presence of test server running .. (from google groups)

Now include, this file in your unit/functional tests just like you include test_helper.rb in all your test files. After you have saved above file as RAILS_APP/test/solr_test_helper.rb do not forget to include solr_test_helper in your test files and start writing your tests. :)

Oops! Still not working
Did you make sure that config/solr.yml has been configured for testing environment. (by default solr test server runs on port 8981)

References

分享到:
评论

相关推荐

    ikanalyzer-solr8.4.0_solr8_solr_ikanalyzer_中文分词_

    Solr8.4.0 是 Apache Solr 的一个版本,这是一个高度可配置、高性能的全文搜索和分析引擎,广泛用于构建企业级搜索应用。 在 Solr 中,ikanalyzer 是一个重要的组件,它通过自定义Analyzer来实现中文的分词处理。...

    solr-config_solrj-demo.rar_DEMO_solr_solr的j

    总之,"solr-config_solrj-demo.rar_DEMO_solr_solr的j"这个DEMO是一个全面了解和实践Solr配置及SolrJ使用的宝贵资源,它将引导你逐步掌握如何在实际项目中有效地运用Solr进行全文检索和数据分析。通过深入学习和...

    hive-solr-master.zip_hive_solr_solr-hive

    CREATE EXTERNAL TABLE solr_table (field1 STRING, field2 INT) STORED BY 'org.apache.hive.storage.solr.SolrStorageHandler' TBLPROPERTIES ( 'solr.zookeeper'='zk_host:zk_port/solr', 'solr.collection'='...

    solr服务器_solr_

    Solr服务器是Apache Lucene项目的一个子项目,是一款开源的企业级搜索平台,专门用于处理大量文本数据的全文检索、搜索和分析。它基于Java开发,能够处理多种数据源,包括XML、JSON、CSV等,提供了高效、可扩展的...

    sss.rar_solr_solr search

    **Solr:开源企业搜索引擎** Apache Solr,作为一款强大的开源全文搜索引擎,广泛应用于各种企业和组织,用于构建高效、可扩展的搜索解决方案。本教程旨在深入解析Solr的核心概念、功能以及实际应用,帮助读者掌握...

    solr_solr_lucene_

    Solr是一个高性能,采用Java开发,基于Lucene的全文搜索服务器。同时对其进行了扩展,提供了比Lucene更为丰富的查询语言,同时实现了可配置、可扩展并对查询性能进行了优化,并且提供了一个完善的功能管理界面,是一...

    es与solr的区别_solr_ES_es与solr的区别_elasticsearch_

    **Elasticsearch 与 Solr 比较详解** 在大数据和搜索引擎领域,Elasticsearch (ES) 和 Apache Solr 都是广泛使用的开源技术,它们都基于 Lucene 库,提供高性能、可扩展的全文搜索和分析能力。然而,两者在设计哲学...

    php_solr.dll

    windows环境php5.5的php_solr.dll

    ansj_solr_all

    《ANSJ与Solr的深度整合:构建高效全文搜索引擎》 在信息爆炸的时代,如何高效地从海量数据中检索出我们需要的信息,成为了企业和个人都亟待解决的问题。这就是全文搜索引擎的作用所在。ANSJ(Automatic NLP ...

    php_solr.dll ,php5.3版本

    Apache Solr 是一个开源的企业级搜索平台,它允许快速、可扩展的全文检索,以及丰富的搜索功能和结果排序。在 PHP 中集成 Solr 扩展,可以极大地提高开发效率,使开发者能够轻松地在网站中实现高级搜索功能。 标题...

    solr_Tomcat_lib包

    在搭建Solr环境时,`solr_Tomcat_lib`包扮演了关键角色,因为它是Solr在Tomcat容器中运行所需的库文件集合。 首先,让我们详细了解`lib`目录。这个目录通常包含Solr运行所必需的各种JAR文件,这些文件主要分为以下...

    bhl_rails_solr-源码.rar

    3. **app/models/solr_document.rb**:定义了Solr文档对象,用于映射Rails模型到Solr索引。 4. **config目录**:配置文件,如solr.yml,用于配置Solr服务器的连接信息。 5. **initializers/solr.rb**:初始化脚本,...

    acts_as_solr_reloaded:具有新功能的ActsAsSolr

    script/plugin install git://github.com/brauliobo/acts_as_solr_reloaded.git 下载Solr rake solr:download 要求 Oracle或OpenJDK的Java Runtime Environment(JRE)6.0(或更高版本) 配置 请参阅config

    Apache_solr_4_cookbook.pdf

    Apache Solr 是一个基于 Apache Lucene 的开源搜索引擎,为各种应用提供了搜索功能。它具有高效、可靠和强大的特点,广泛应用于企业级搜索、互联网搜索和数据分析。Solr 提供了包括全文搜索、高亮显示、自动拼写更正...

    apache-solr-1.4.0.zip_apache-solr _apache-solr-1.4.0_apache-solr

    Apache Solr 是一个开源的企业级搜索平台,由Apache软件基金会开发。它基于Java,并且是Lucene库的一个高级封装,提供了更加便捷和可扩展的全文检索、数据分析和搜索功能。在1.4.0版本中,Solr已经相当成熟,为用户...

    Apache_Solr_初级教程

    ### Apache Solr 初级教程知识点总结 #### 一、Apache Solr 概述 - **全文检索技术的重要性**:随着互联网的发展,信息量日益膨胀,如何高效地从海量信息中提取有价值的内容变得至关重要。全文检索技术作为一种...

    Lucene_Solr_搜索引擎解密.ppt

    Lucene Solr 搜索引擎解密 ppt

    solr_v440_5001+zookeeper2181

    在每个Solr节点上,我们需要配置solr.in.sh(在Unix/Linux系统中)或solr.in.cmd(在Windows系统中),设置`SOLR_HOME`、`JAVA_OPTS`等环境变量,并指定ZooKeeper集群的位置。 3. **启动SolrCloud**:启动每个Solr...

    solr_开发入门例子

    本文将基于"solr_开发入门例子"这一主题,详细解释Solr的基础知识,包括其核心概念、安装配置、索引创建与查询,以及相关的开发工具。 1. **Solr核心概念** - **索引**: Solr通过建立倒排索引来实现快速全文检索。...

    solr_3.5_配置及应用

    Solr 是一个开源的全文搜索引擎,它提供了高效且可扩展的搜索和索引能力。在3.5版本中,Solr 已经成为一个成熟的技术,广泛应用于网站的全文检索、商品搜索、文档检索等多个场景。本文将深入探讨 Solr 3.5 的配置...

Global site tag (gtag.js) - Google Analytics