Solr 同义词搜索 synonyms -

a280606790

浏览: 486294 次
性别:
来自: 湖南

最近访客更多访客>>

paganini0102

Hello---World

yezhi3514

hellodota

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Solr 同义词搜索 synonyms

博客分类：

solr

个人技术博客：http://demi-panda.com

solr.SynonymFilterFactory

Creates SynonymFilter .

Matches strings of tokens and replaces them with other strings of tokens.

The synonyms parameter names an external file defining the synonyms.
If ignoreCase is true, matching will lowercase before checking equality.
If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.
The optional tokenizerFactory parameter names a tokenizer factory class to analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319 ), which can help with the synonym+stemming problem described in http://search-lucene.com/m/hg9ri2mDvGk1 .

Example usage in schema:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.ChineseTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" 
           expand="true" tokenizerFactory="solr.ChineseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
            catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.ChineseTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" 
            expand="true" tokenizerFactory="solr.ChineseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
            catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
</fieldType>

# blank lines and lines starting with pound are comments.

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
#-----------------------------------------------------------------------
#some test synonym mappings unlikely to appear in real input text
aaafoo => aaabar
bbbfoo => bbbfoo bbbbar
cccfoo => cccbar cccbaz
fooaaa,baraaa,bazaaa

# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs 
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming
#after us won't split it into two words.
飞利浦刮胡刀,飞利浦剃须刀

# Synonym mappings can be used for spelling correction too
pixima => pixma

a\,a => b\,b

分享到：

is not in the sudoers file | HP6531S 拆装日记

2012-02-09 18:30
浏览 2636
评论(1)
分类:互联网
查看更多

1 楼 kingflying 2012-08-14

a280606790，你好！solr wiki上面说expand设置为true，那么把所有的同义词文件展开，如果设置为false，那么所有的词都用第一个词来代替。这样看来，设置为false的好处明显大于true，避免了展开，又减少了索引的存储空间，那么为什么在实际项目中看到的往往是true呢，是不是设置为false后会对查询有影响呢。希望能得到解答，谢谢了！

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Solr 同义词搜索 synonyms

个人技术博客：http://demi-panda.com

solr.SynonymFilterFactory

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Solr 同义词搜索 synonyms

个人技术博客：http://demi-panda.com

solr.SynonymFilterFactory

评论

发表评论

相关推荐

对Lucene PhraseQuery的slop的理解

Solr 获取分词

Apache Lucene 和 Solr 进行位置感知搜索

Lucene Similarity (Lucene 文档评分score机制详解)

Solr Replication

Solr合并索引方式

Solr Faceted

SolrOperationsTools使用

LucidGaze for Solr 搜索监测工具

SOLR的分布式部署（复制）CollectionDistribute 快照分发 （精简版)

solr 分布式(复制)配置

solr1.4配置IKAnalyzer3.2

Lucene/Solr开发经验[转载]

solr1.4 中文 庖丁 使用方法

最近访客更多访客>>

SOLR的分布式部署（复制）CollectionDistribute 快照分发（精简版)

solr1.4 中文庖丁使用方法