简而言之,如果两个document之间的mapping比较类似,则使用type(同一个index下两个type),否则使用两个index可能是更好的选择。
https://www.elastic.co/blog/index-vs-type
注意红色字体的部分:
Who has never wondered whether new data should be put into a new type of an existing index, or into a new index? This is a recurring question for new users, that can’t be answered without understanding how both are implemented.
In the past we tried to make elasticsearch easier to understand by building an analogy with relational databases: indices would be like a database, and types like a table in a database. This was a mistake: the way data is stored is so different that any comparisons can hardly make sense, and this ultimately led to an overuse of types in cases where they were more harmful than helpful.
What is an index?
An index is stored in a set of shards, which are themselves Lucene indices. This already gives you a glimpse of the limits of using a new index all the time: Lucene indices have a small yet fixed overhead in terms of disk space, memory usage and file descriptors used. For that reason, a single large index is more efficient than several small indices: the fixed cost of the Lucene index is better amortized across many documents.
Another important factor is how you plan to search your data. While each shard is searched independently, Elasticsearch eventually needs to merge results from all the searched shards. For instance if you search across 10 indices that have 5 shards each, the node that coordinates the execution of a search request will need to merge 5x10=50 shard results. Here again you need to be careful: if there are too many shard results to merge and/or if you ran an heavy request that produces large shard responses (which can easily happen with aggregations), the task of merging all these shard results can become very resource-intensive, both in terms of CPU and memory. Again this would advocate for having fewer indices.
What is a type?
This is where types help: types are a convenient way to store several types of data in the same index, in order to keep the total number of indices low for the reasons exposed above. In terms of implementation it works by adding a “_type” field to every document that is automatically used for filtering when searching on a specific type. One nice property of types is that searching across several types of the same index comes with no overhead compared to searching a single type: it does not change how many shard results need to be merged.
However this comes with limitations as well(type有哪些限制):
- Fields need to be consistent across types. For instance if two fields have the same name in different types of the same index, they need to be of the same field type (string, date, etc.) and have the same configuration.
- Fields that exist in one type will also consume resources for documents of types where this field does not exist. This is a general issue with Lucene indices: they don’t like sparsity. Sparse postings lists can’t be compressed efficiently because of high deltas between consecutive matches. And the issue is even worse with doc values: for speed reasons, doc values often reserve a fixed amount of disk space for every document, so that values can be addressed efficiently. This means that if Lucene establishes that it needs one byte to store all value of a given numeric field, it will also consume one byte for documents that don’t have a value for this field. Future versions of Elasticsearch will have improvements in this area but I would still advise you to model your data in a way that will limit sparsity as much as possible.
- Scores use index-wide statistics, so scores of documents in one type can be impacted by documents from other types.
This means types can be helpful, but only if all types from a given index have mappings that are similar. Otherwise, the fact that fields also consume resources in documents where they don’t exist could make things worse than if the data had been stored in separate indices.
Which one should I use?
This is a tough question, and the answer will depend on your hardware, data and use-case. First it is important to realize that types are useful because they can help reduce the number of Lucene indices that Elasticsearch needs to manage. But there is another way that you can reduce this number: creating indices that have fewer shards. For instance, instead of folding 5 types into the same index, you could create 5 indices with 1 primary shard each.
I will try to summarize the questions you should ask yourself to make a decision:
- Are you using parent/child? If yes this can only be done with two types in the same index.
- Do your documents have similar mappings? If no, use different indices.
- If you have many documents for each type, then the overhead of Lucene indices will be easily amortized so you can safely use indices, with fewer shards than the default of 5 if necessary.
- Otherwise you can consider putting documents in different types of the same index. Or even in the same type.
In conclusion, you may be surprised that there are not as many use cases for types as you expected. And this is right: there are actually few use cases for having several types in the same index for the reasons that we mentioned above. Don’t hesitate to allocate different indices for data that would have different mappings, but still keep in mind that you should keep a reasonable number of shards in your cluster, which can be achieved by reducing the number of shards for indices that don’t require a high write throughput and/or will store low numbers of documents.
相关推荐
Elasticsearch的基本概念包括Node、Cluster、Index和Document。Node是Elasticsearch中的单个实例,可以单独运行,也可以在集群中作为数据节点或协调节点。Cluster是由多个Node组成的,每个Cluster都有一个唯一的集群...
其主要概念包括索引(Index)、类型(Type)、文档(Document)和节点(Node)。索引用于存储数据,类型是索引内的逻辑分类,文档是存储在Elasticsearch中的JSON对象,而节点是组成集群的基本单位。 2. **版本...
《Elasticsearch-Jieba-Plugin 8.8.2:为Elasticsearch引入中文分词的强大力量》 Elasticsearch(ES)是一款强大的全文搜索引擎,广泛应用于大数据分析、日志检索、内容推荐等领域。然而,对于中文处理,Elastic...
在IT行业中,SpringMVC和Elasticsearch是两个非常重要的技术组件。SpringMVC作为Spring框架的一部分,主要用于构建Web应用程序的模型-视图-控制器(MVC)架构,而Elasticsearch则是一种分布式、RESTful风格的搜索和...
在IT行业中,Elasticsearch是一种极其重要的开源全文搜索引擎,它基于Lucene库,提供分布式、实时、可扩展的搜索和分析功能。而PHP是广泛应用于Web开发的编程语言,因此,将Elasticsearch与PHP结合使用能为开发者...
在Web开发中,Elasticsearch已经成为了一个强大的搜索引擎和数据分析平台,尤其在处理大量实时数据时,其高效、灵活的特点得到了广泛应用。PHP作为后端开发的主流语言之一,与Elasticsearch的集成是常见的需求。在...
3. **类型(Type)**:在旧版本的Elasticsearch中,每个索引可以包含多个类型,但在7.x版本后已被移除,所有文档都属于默认的`_doc`类型。 4. **节点(Node)**:运行Elasticsearch实例的服务器称为节点,多个节点...
Elasticsearch的架构主要由三个部分组成:Index、Type和Document。Index是一个逻辑上的存储单元,Type是Index中的一个逻辑分区,Document是Elasticsearch中的基本数据单元。 2. Elasticsearch的数据存储 Elastic...
- **类型(Type)**:在 Elasticsearch 7.x 及更高版本中已被弃用,但早期版本中,索引内的文档可以属于特定类型。 - **映射(Mapping)**:定义字段的数据类型,有助于优化查询性能。 **2. Elasticsearch 的安装与...
在Elasticsearch中,索引(Index)是数据的容器,类似于关系数据库中的数据库。文档(Document)是存储在索引中的基本单位,它是JSON格式的数据对象。类型(Type)原本用于区分索引中的不同数据类型,但在最新版本中...
在使用Elasticsearch时,你需要了解基本的概念,如索引(Index)、类型(Type,7.x版本中已废弃,被映射为Document)、文档(Document)和字段(Field)。还需要掌握如何创建索引、索引数据、执行查询以及管理集群的...
**Elasticsearch 7.17.11 分词器插件详解** Elasticsearch(简称ES)是一款基于Lucene的分布式、RESTful搜索引擎,广泛应用于日志收集、数据分析等领域,是ELK(Elasticsearch、Logstash、Kibana)堆栈的重要组成...
- **类型(Type)**:在Elasticsearch 7.x及以后版本中已被废弃,之前用作索引内部的分类。 - **映射(Mapping)**:定义字段的数据类型和分析规则。 - **分片(Shard)**:索引被分成多个分片,可以分布在不同节点上,...
**Elasticsearch 8.1.2 Linux 版本详解** ...Elasticsearch 8.1.2在Linux上的部署和使用,涉及了从基础概念理解、安装配置到实际操作的各个环节。通过深入学习和实践,可以充分发挥其在大数据环境中的搜索和分析能力。
**Elasticsearch 2.3.4 和 IK Head 插件详解** Elasticsearch 是一个分布式、RESTful 风格的搜索和数据分析引擎,它能够处理大量的数据并提供快速的搜索、分析和实时的数据存储功能。在版本 2.3.4 中,Elastic...
总结:Elasticsearch 2.4.6 在日志管理和实时分析方面表现出色,与 Logstash 和 Kibana 组成的 ELK(Elasticsearch, Logstash, Kibana)堆栈是数据收集、处理和可视化的强大工具。通过合理配置和使用,可以有效地...
Elasticsearch 7.4.2 是一个高度可扩展的开源全文搜索引擎,它以其高效、分布式、实时的搜索和分析能力而闻名。这个版本包含了已编译好的多种插件,如IK分词器、拼音分词器和jieba分词器,这些都是针对中文处理的...
Elasticsearch的核心概念包括节点(Node)、索引(Index)、文档(Document)和类型(Type)。节点是Elasticsearch实例,它们通过网络相互连接形成集群。索引类似于数据库,用于存储文档。文档是JSON格式的数据对象,类型则...
在Windows平台上,Elasticsearch提供了方便的安装包,便于在Windows操作系统上搭建和管理搜索和数据分析环境。此压缩包“elasticsearch-8.6.1-windows-x86_64.zip”是专为64位Windows系统设计的最新版本8.6.1。 **...
4. **类型(Type)**:在Elasticsearch 7.x版本后已被废弃,现在所有数据都默认存储在名为`_doc`的类型中。 5. **文档(Document)**:存储在索引中的具体数据记录,以JSON格式表示。 6. **分片(Shard)**:索引可以被...