`

Apache SOLR and Carrot2 integration strategies 1

 
阅读更多

deploy carrot2-webapp

1.  download soucre code

#git clone git://github.com/carrot2/carrot2.git

 

2.compile

#cd carrot2

#ant webapp

 

3.deploy

#cp tmp/webapp/carrot2-webapp.war  /path/to/tomcat/webapps

 

4.configure  carrot2

#cd /path/to/tomcat/webapps/carrot2-webapp/WEB-INF/suites

#mv  suite-webapp.xml    suite-webapp.xml.old

#cp   source-solr.xml     suite-webapp.xml

alter it like this:

<component-suite>
  <sources>
    <source component-class="org.carrot2.source.solr.SolrDocumentSource" id="solr"
            attribute-sets-resource="source-solr-attributes.xml">
      <label>Solr</label>
      <title>Solr Search Engine</title>
      <icon-path>icons/solr.png</icon-path>
      <mnemonic>s</mnemonic>
      <description>Solr document source queries an instance of Apache Solr search engine.</description>
      <example-queries>
        <example-query>test</example-query>
        <example-query>solr</example-query>
      </example-queries>
    </source>
  </sources>

  <include suite="algorithm-lingo.xml"></include>

</component-suite>

 

 4. edit source-solr-attributes.xml

<attribute-sets default="overridden-attributes">
  <attribute-set id="overridden-attributes">
    <value-set>
      <label>overridden-attributes</label>
      <attribute key="SolrDocumentSource.serviceUrlBase">
        <value type="java.lang.String" value="http://192.168.10.204:8983/inokarticle/clustering"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrSummaryFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrTitleFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
    </value-set>
  </attribute-set>
</attribute-sets>

  

5. edit algorithm-lingo-attributes.xml   algorithm-lingo.xml

 

 ----------------------------------------------------

integrate with solr

1. configure solrconfig.xml

a. import related jars

  <lib dir="../contrib/clustering/lib/" regex=".*\.jar" />
  <lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />

 

b. add component  adn clustering requesthandler

 

<searchComponent name="clustering"
                   enable="true"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">lingo</str>
      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
      <str name="carrot.resourcesDir">clustering/carrot2</str>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
      <str name="PreprocessingPipeline.tokenizerFactory">org.carrot2.text.linguistic.DefaultTokenizerFactory</str>

      </lst>

  </searchComponent>
<requestHandler name="/clustering"
                  startup="lazy"
                  enable="true"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">lingo</str>
      <bool name="clustering.results">true</bool>
      <!-- Field name with the logical "title" of a each document (optional) -->
      <str name="carrot.title">content</str>
      <!-- Field name with the logical "URL" of a each document (optional) -->
      <str name="carrot.url">id</str>
      <!-- Field name with the logical "content" of a each document (optional) -->
      <str name="carrot.snippet">content</str>
      <!-- Apply highlighter to the title/ content and use this for clustering. -->
      <bool name="carrot.produceSummary">true</bool>
      <!-- the maximum number of labels per cluster -->
      <int name="carrot.numDescriptions">5</int>
      <!-- produce sub clusters -->
      <bool name="carrot.outputSubClusters">true</bool>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>

      <!-- Configure the remaining request handler parameters. -->
      <str name="defType">edismax</str>
      <str name="q.alt">*:*</str>
      <str name="rows">10</str>
      <str name="fl">*,score</str>
    </lst>
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

 

2.custom chinese tokenizer for clustering

a. modify related carrot souce code and recompile

b. copy related jars and lexicon  to solr web lib dir

 

Details see Apache SOLR and Carrot2 integration strategies 2

 

 

 

 

 

 

 

References

http://wiki.apache.org/solr/ClusteringComponent

http://www.cnblogs.com/cy163/archive/2010/05/07/1730172.html

http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html

http://download.carrot2.org/head/manual/index.html#section.advanced-topics.building-from-source-code

http://www.cnblogs.com/shm10/p/3700604.html
分享到:
评论

相关推荐

    solr的carrot2需要用到的文件

    solr的carrot2需要用到的文件solr-integration-strategies-gh-pages carrot3.9webapp,还有tomcat还有solr4.81请自己下载

    最新可用已配置好solr的carrot2插件

    最新可用已配置好solr的carrot2插件,tomcat里面需配置好solr具体到http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html查看

    Apache Solr 4 Cookbook

    Apache Solr 4 Cookbook Apache Solr 4 Cookbook Apache Solr 4 Cookbook Apache Solr 4 Cookbook Apache Solr 4 Cookbook

    Spring Data for Apache Solr API(Spring Data for Apache Solr 开发文档).CHM

    Spring Data for Apache Solr API。 Spring Data for Apache Solr 开发文档

    Apache Solr(solr-8.11.1.tgz)

    Apache Solr 是一个开源的全文搜索引擎,由Apache软件基金会维护,是Lucene项目的一部分。它提供了高效、可扩展的搜索和导航功能,广泛应用于企业级的搜索应用中。Solr-8.11.1是该软件的一个特定版本,包含了最新的...

    Apache Solr Essentials(PACKT,2015)

    Apache Solr Essentials is a fast-paced guide to help you quickly learn the process of creating a scalable, efficient, and powerful search application. The book starts off by explaining the ...

    Apache Solr(solr-8.11.1.zip)

    Apache Solr是一款开源的企业级搜索平台,由Apache软件基金会维护。它是基于Java的,提供了高效、可扩展的全文检索、数据分析和分布式搜索功能。Solr-8.11.1是该软件的一个特定版本,包含了从早期版本到8.11.1的所有...

    Apache.Solr.Search.Patterns.1783981849

    This book is for developers who already know how to use Solr and are looking at procuring advanced strategies for improving their search using Solr. This book is also for people who work with ...

    apache solr搜索系统的.Net实现

    apache solr搜索系统的.Net实现

    Apache Solr and Tomcat6 Search engine

    **Apache Solr与Tomcat6搜索引擎** Apache Solr是一个开源的企业级搜索平台,它基于Lucene库,提供了高效、可扩展的全文检索、命中高亮、拼写检查、分类、聚类等多种功能。Solr的核心特性是其强大的索引能力和快速...

    Apache Solr High Performance.pdf&Solr;+In+Action+2013.pdf英文版

    Apache Solr是一款强大的开源搜索平台,它被广泛用于构建高效、可扩展的全文搜索引擎。这两本电子书——"Apache Solr High Performance.pdf" 和 "Solr In Action 2013.pdf" 提供了深入的Solr知识,帮助读者理解和...

    apache solr 源文件 3.6.1

    Apache Solr 是一个开源的企业级搜索平台,由Apache软件基金会维护。版本3.6.1是Solr的一个重要里程碑,提供了稳定性和性能优化。通过深入理解这个版本的源代码,开发者可以更深入地掌握Solr的工作原理,从而更好地...

    Apache Solr Search

    ### Apache Solr Search:一种强大的开源企业搜索解决方案 #### Apache Solr简介 Apache Solr是一款基于Lucene Java搜索引擎库的企业级搜索服务器。它不仅继承了Lucene的强大功能,还在此基础上进行了扩展,提供了...

    Mastering Apache Solr 7.x An expert guide to advancing, optimizing, 无水印转化版pdf

    ### Apache Solr 7.x Mastering Guide:提升、优化与扩展企业级搜索技术详解 #### 知识点一:Apache Solr 7.x 概览 - **版本更新要点**:本书聚焦于Apache Solr 7.x版本的核心特性和新增功能,包括性能提升、稳定性...

Global site tag (gtag.js) - Google Analytics