`
郑云飞
  • 浏览: 808689 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

索引模块-ICU分析插件(ICU Analysis Plugin)

阅读更多

ICU Analysis Plugin

The ICU analysis plugin allows for unicode normalization, collation and folding. The plugin is called elasticsearch-analysis-icu.

The plugin includes the following analysis components:

ICU Normalization

Normalizes characters as explained here. It registers itself by default undericu_normalizer or icuNormalizer using the default settings. Allows for the name parameter to be provided which can include the following values: nfc, nfkc, andnfkc_cf. Here is a sample settings:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "normalization" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_normalizer"]
                }
            }
        }
    }
}

ICU Folding

Folding of unicode characters based on UTR#30. It registers itself under icu_folding andicuFolding names.
The filter also does lowercasing, which means the lowercase filter can normally be left out. Sample setting:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "folding" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_folding"]
                }
            }
        }
    }
}

Filtering

The folding can be filtered by a set of unicode characters with the parameterunicodeSetFilter. This is useful for a non-internationalized search engine where retaining a set of national characters which are primary letters in a specific language is wanted. See syntax for the UnicodeSet here.

The Following example excempt Swedish characters from the folding. Note that the filtered characters are NOT lowercased which is why we add that filter below.

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "folding" : {
                    "tokenizer" : "standard",
                    "filter" : ["my_icu_folding", "lowercase"]
                }
            }
            "filter" : {
                "my_icu_folding" : {
                    "type" : "icu_folding"
                    "unicodeSetFilter" : "[^åäöÅÄÖ]"
                }
            }
        }
    }
}

ICU Collation

Uses collation token filter. Allows to either specify the rules for collation (defined here) using the rules parameter (can point to a location or expressed in the settings, location can be relative to config location), or using the language parameter (further specialized by country and variant). By default registers under icu_collation or icuCollation and uses the default locale.

Here is a sample settings:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_collation"]
                }
            }
        }
    }
}

And here is a sample of custom collation:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["myCollator"]
                }
            },
            "filter" : {
                "myCollator" : {
                    "type" : "icu_collation",
                    "language" : "en"
                }
            }
        }
    }
}



http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779

 



12345

 

 

 

 

 

 




0
1
分享到:
评论

相关推荐

    analysis-icu-1.1.0-SNAPSHOT.zip

    “analysis-icu-client-1.1.0-SNAPSHOT.jar”是插件的客户端库,包含了与OpenSearch交互所需的所有API,使得开发者可以方便地在索引文档时使用ICU的分析器。这使得ICU插件不仅限于服务器端,还能在客户端进行预处理...

    elasticsearch-analysis-ik-7.4.2.zip

    Elasticsearch 分析插件 IK 是一款为 Elasticsearch 设计的高性能中文分词器,它的全称为 "elasticsearch-analysis-ik"。此版本是 7.4.2,专为 Elasticsearch 7.4.2 版本优化。在中文文档处理中,分词是至关重要的...

    elasticsearch7.17.11版本分词器插件安装包

    **Elasticsearch 7.17.11 分词器插件详解** Elasticsearch(简称ES)是一款基于Lucene的分布式、RESTful搜索引擎,广泛应用于日志收集、数据分析等领域,是ELK(Elasticsearch、Logstash、Kibana)堆栈的重要组成...

    sample-domain-implementation-1.0.2.Final.zip

    《Elasticsearch与ICU分析插件:sample-domain-implementation-1.0.2.Final.zip详解》 在现代大数据处理领域,Elasticsearch作为一款强大的开源搜索引擎,因其高效的全文检索能力而广受青睐。然而,为了满足各种...

    elasticsearch+kibana.doc

    安装插件是扩展Elasticsearch功能的重要步骤,如安装分词插件`analysis-icu`,使用`bin/elasticsearch-plugin install analysis-icu`命令。插件列表可通过`bin/elasticsearch-plugin list`查看。安装后的插件可以在...

    elasticsearch-7.5.2-linux-x86_64.tar.gz

    - `modules/`:内置模块,如 ingest、analysis-icu等,提供额外的功能。 - `plugins/`:安装的第三方插件存放位置,可以增强Elasticsearch的功能。 在实际部署中,你需要修改`config/elasticsearch.yml`来配置集群...

    elasticsearch-5.4.3

    4. **modules** 文件夹:此目录包含了Elasticsearch自带的模块,如`analysis-icu`(提供国际化和语言支持)、`x-pack`(提供安全、监控、警报、报告等高级功能,不过5.4.3版本可能需要单独购买)等。 5. **LICENSE....

    elasticsearch-8.11.3-linux-x86-64.tar.gz

    4. **modules** 目录:内置模块,如 ingest、analysis-icu 等,提供了额外的功能和处理管道。 5. **plugins** 目录:默认为空,用于安装和管理 Elasticsearch 插件的地方。 6. **logs** 目录:默认的日志文件存放...

    elasticsearch6.7.0-windows

    4. modules目录:包含Elasticsearch内置的模块,如analysis-icu(国际化支持)、x-pack(安全、监视、告警等高级功能)等。 5. plugins目录:用于存放用户安装的插件,Elasticsearch具有丰富的插件生态系统,可以...

    elasticsearch-7.2.1-darwin-x86_64 (1).tar.gz

    4. **modules**:这里存放了Elasticsearch的核心模块,例如`ingest`用于数据预处理,`analysis-icu`提供了国际化的分析器。 5. **plugins**:默认情况下可能为空,但你可以在这个目录下安装和管理Elasticsearch插件...

    es7.0 ik的分词器

    plugin-security.policy和plugin-descriptor.properties是ES插件的安全策略和元数据描述;而"config"目录可能包含了IK分词器的配置文件。 **总结** 在Elasticsearch 7.0中,使用IK分词器是优化中文分词和搜索性能...

Global site tag (gtag.js) - Google Analytics