`
m635674608
  • 浏览: 5027895 次
  • 性别: Icon_minigender_1
  • 来自: 南京
社区版块
存档分类
最新评论

Elasticsearch : Advanced search and nested objects

 
阅读更多

In a non relational database system, joins can miss. Fortunately, Elasticsearch provides solutions to meet these needs :

Array Type

Read the doc on elasticsearch.org

As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).

Here are some valid indexing examples :

{
    "Article" : [
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Francois",
                "surname": "francoisg",
                "id" : 18
            },
            {
                "firstname" : "Gregory",
                "surname" : "gregquat"
                "id" : "2"
            }
        ]
      }
    },
    {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Gregory",
                "surname" : "gregquat",
                "id" : "2"
            }
        ]
      }
}

You can find different Array :

  • Categories : array of integers
  • Tags : array of strings
  • author : array of objects (inner objects or nested)

We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :

  • To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
  • To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).

Inner objects

The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : object
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

You can Filter or Query on these “inner objects”. For example :

query: author.firstname=Francois will return the post with the id 12 (and not the one with the id 13).

You can read more on the Elasticsearch website

Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.

The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :

[
      {
        "id" : 12
        "title" : An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Francois","Gregory"],
        "author.surname" : ["Francoisg","gregquat"],
        "author.id" : [18,2]
      }
      {
        "id" : 13
        "title" : "A second article",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Gregory"],
        "author.surname" : ["gregquat"],
        "author.id" : [2]
      }
]

The consequence is that the query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "firstname": "francois",
          "surname": "gregquat"
        }
      }
    }
  }
}


author.firstname=Francois AND surname=gregquat will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.

To fix this problem, you must use the nested.

Les nested

First important difference : nested must be specified in your mapping.

The mapping looks like an object one, only the type changes :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : nested
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

This time, the internal representation will be :

[
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Francois",
            "surname" : "Francoisg",
            "id" : 18
        },
        {
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      },
      {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tags" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      }
]

This time, we keep the object structure.

Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "nested" : {
          "path" : "author",
          "filter": {
            "bool": {
              "must": [
                {
                  "term" : {
                    "author.firsname": "francois"
                  }
                },
                {
                  "term" : {
                    "author.surname": "gregquat"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}


hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.

There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.

To fix this problem, you can use the parent/child associations.

Parent/Child

Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.

We are going to link our article to a category :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                category : 
                    mappings : 
                        id : ~
                        name : ~
                        description : ~
                article :
                    mappings:
                        title : ~
                        tag : ~
                        author : ~
                    _routing:
                        required: true
                        path: category
                    _parent:
                        type : "category"
                        identifier: "id" #optional as id is the default value
                        property : "category" #optional as the default value is the type value

When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.

Like for nested, there are Filters and Queries that allow to search on parents or children :

  • Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
  • Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.
{
  "query": {
    "has_child": {
      "type": "article",
      "query" : {
        "filtered": {
          "query": { "match_all": {}},
          "filter" : {
              "term": {"tag": "symfony"}
          }
        }
      }
    }
  }
}


This query will return the Categories that have at least one article tagged with “symfony”.

The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.

These websites can also be interested to read :

http://obtao.com/blog/2014/04/elasticsearch-advanced-search-and-nested-objects/

分享到:
评论

相关推荐

    Go-go-elasticsearch:Elasticsearch官方的go语言客户端

    es, err := elasticsearch.NewDefaultClient() if err != nil { panic(err) } res, err := es.Index(index, "", &doc, nil) if err != nil { panic(err) } defer res.Body.Close() if res.IsError() {...

    elasticsearch-6.8.3-API文档-中文版.zip

    赠送jar包:elasticsearch-6.8.3.jar; 赠送原API文档:elasticsearch-6.8.3-javadoc.jar; 赠送源代码:elasticsearch-6.8.3-sources.jar; 赠送Maven依赖信息文件:elasticsearch-6.8.3.pom; 包含翻译后的API文档...

    解决spring-data-elasticsearch 5.4.0 不支持 5.4.1的elasticsearch问题

    Spring Data Elasticsearch 5.4.0设计时可能并未考虑到与Elasticsearch 5.4.1的完全兼容,导致在升级Elasticsearch到5.4.1后,系统报出"NoNodeAvailableException"错误,提示无法连接到任何节点。这个问题主要是由于...

    elasticsearch:7.17.8/arm64

    docker pull --platform=arm64 elasticsearch:7.17.8 Elasticsearch 是位于 Elastic Stack 核心的分布式搜索和分析引擎。Logstash 和 Beats 有助于收集、聚合和丰富您的数据并将其存储在 Elasticsearch 中。Kibana ...

    elasticsearch-6.2.3-API文档-中文版.zip

    赠送jar包:elasticsearch-6.2.3.jar; 赠送原API文档:elasticsearch-6.2.3-javadoc.jar; 赠送源代码:elasticsearch-6.2.3-sources.jar; 赠送Maven依赖信息文件:elasticsearch-6.2.3.pom; 包含翻译后的API文档...

    elasticsearch-6.3.0-API文档-中文版.zip

    赠送jar包:elasticsearch-6.3.0.jar; 赠送原API文档:elasticsearch-6.3.0-javadoc.jar; 赠送源代码:elasticsearch-6.3.0-sources.jar; 赠送Maven依赖信息文件:elasticsearch-6.3.0.pom; 包含翻译后的API文档...

    Elasticsearch权威指南(中文版)-.pdf

    Elasticsearch 权威指南 Elasticsearch 是一个实时分布式搜索和分析引擎,它让你以前所未有的速度处理大数据成为可能。它用于全文搜索、结构化搜索、分析以及将这三者混合使用。 Elasticsearch 的优点在于将全文...

    elasticsearch-5.5.1-API文档-中文版.zip

    赠送jar包:elasticsearch-5.5.1.jar; 赠送原API文档:elasticsearch-5.5.1-javadoc.jar; 赠送源代码:elasticsearch-5.5.1-sources.jar; 赠送Maven依赖信息文件:elasticsearch-5.5.1.pom; 包含翻译后的API文档...

    Elasticsearch数据库全套教程+安装部署+框架原理+开发调用+性能调优等

    Elasticsearch数据库:Elasticsearch与Kubernetes集成技术教程.pdf Elasticsearch数据库:Elasticsearch在实时日志分析中的应用.pdf Elasticsearch数据库:Elasticsearch在电子商务搜索中的实践.pdf Elasticsearch...

    Relevant search with applications for Solr and Elasticsearch

    标题“Relevant search with applications for Solr and Elasticsearch”以及描述“Relevant search with applications for Solr and Elasticsearch”共同强调了本书的核心内容:**相关搜索(Relevant Search)**在*...

    《ElasticSearch入门到实战》电子书,从入门到进阶实战项目的教程文档,框架SpringBoot框架整合ES.zip

    **Elasticsearch 入门与实战** Elasticsearch 是一个基于 Lucene 的开源全文搜索引擎,以其分布式、可扩展性、实时搜索以及强大的数据分析能力而受到广泛欢迎。它不仅支持文本搜索,还可以处理结构化和非结构化数据...

    ELK镜像:elasticsearch:7.17.6+kibana:7.17.6+logstash:7.17.6 下载

    ELK:elasticsearch:7.17.6+kibana:7.17.6+logstash:7.17.6 支持操作系统:centos7 中标麒麟、银河麒麟、通信UOS 均已经验证 安装方式,docker安装

    Learning Elasticsearch 【含代码】

    You will install and set up Elasticsearch and its related plugins, and handle documents using the Distributed Document Store. You will see how to query, search, and index your data, and perform ...

    elasticsearch6.1.0+elasticsearch head

    **Elasticsearch 6.1.0:分布式搜索引擎与分析引擎** Elasticsearch 是一个开源的、基于 Lucene 的全文搜索引擎,它具有分布式、实时、可扩展的特点,被广泛应用于日志分析、实时监控、数据搜索等多个场景。在6.1.0...

    使用dockerfile安装elasticsearch:7.8.0

    使用dockerfile安装elasticsearch:7.8.0

    elasticsearch-6.3.0-API文档-中英对照版.zip

    赠送jar包:elasticsearch-6.3.0.jar; 赠送原API文档:elasticsearch-6.3.0-javadoc.jar; 赠送源代码:elasticsearch-6.3.0-sources.jar; 赠送Maven依赖信息文件:elasticsearch-6.3.0.pom; 包含翻译后的API文档...

    elasticsearch7.17.9版本分词器插件安装包

    "ES"是Elasticsearch的简称,而"ELK"栈是指Elasticsearch、Logstash和Kibana的组合,常用于日志管理和分析。这个分词器插件可以与Logstash集成,帮助处理和分析来自各种日志源的数据。 压缩包中的其他文件如`...

    elasticsearch-5.5.1-API文档-中英对照版.zip

    赠送jar包:elasticsearch-5.5.1.jar; 赠送原API文档:elasticsearch-5.5.1-javadoc.jar; 赠送源代码:elasticsearch-5.5.1-sources.jar; 赠送Maven依赖信息文件:elasticsearch-5.5.1.pom; 包含翻译后的API文档...

Global site tag (gtag.js) - Google Analytics