When you download elasticsearch and start it up you create an elasticsearch node which tries to join an existing cluster if available or creates a new one. Let's say you created your own new cluster with a single node, the one that you just started up. We have no data, therefore we need to create an index.
When you create an index (an index is automatically created when you index the first document as well) you can define how many shards it will be composed of. If you don't specify a number it will have the default number of shards: 5 primaries. What does it mean?
It means that elasticsearch will create 5 primary shards that will contain your data:
____ ____ ____ ____ ____
|1||2||3||4||5||____||____||____||____||____|
Every time you index a document elasticsearch will decide which primary shard is supposed to hold that document and will index it there. Primary shards are not copy of the data, they are the data! With a single node of course multiple shards don't make much sense, but if we start another elasticsearch instance on the same cluster, the shards will be distributed in an even way over the cluster.
Node 1 will then hold for example only three shards:
____ ____ ____
|1||2||3||____||____||____|
Since the remaining two shards have been moved to the newly started node:
____ ____
|4||5||____||____|
Why does this happen? Because elasticsearch is a distributed search engine and this way you can make use of multiple nodes/machines to manage big amounts of data.
Every elasticsearch index is composed of at least one primary shard, since that's where the data is stored. Every shard comes at a cost though, therefore if you have a single node and no foreseeable growth, just stick with a single primary shard.
Another type of shard is replica. The default is 1, meaning that every primary shard will be copied to another shard that will contain the same data. Replicas are used to increase search performance and for fail-over. A replica shard is never going to be allocated on the same node where the related primary is (it would pretty much be like putting a backup on the same disk as the original data).
Back to our example, with 1 replica we'll have the whole index on each node, since 3 replica shards will be allocated on the first node and they will contain exactly the same data as the primaries on the second node:
____ ____ ____ ____ ____
|1||2||3||4R||5R||____||____||____||____||____|
Same for the second node, which will contain a copy of the primary shards on the first node:
____ ____ ____ ____ ____
|1R||2R||3R||4||5||____||____||____||____||____|
With a setup like this, if a node goes down you still have the whole index. The replica shards will automatically become primaries and the cluster will work properly despite the node failure.
相关推荐
Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based...
Elasticsearch-深入理解索引原理 Elasticsearch 中索引(Index)的概念是非常重要的,它是 Elasticsearch 存储数据的基本单元。索引是一个具有类似特性的文档的集合,类比传统的关系型数据库领域来说,索引相当于 ...
Elasticsearch(简称ES)是一种基于Lucene的开源搜索引擎,它在全文搜索、分析和实时数据存储方面具有出色性能。作为NoSQL数据库的一种,Elasticsearch设计之初的目标就是实现分布式、可扩展且具有高可用性的搜索...
- **分片(Shard)**:为了实现水平扩展,Elasticsearch将索引分割为多个分片,每个分片可以分布在不同的节点上。 - **副本(Replica)**:每个分片都可以有多个副本,提高数据冗余性和系统容错性。 2. **安装与配置*...
3. **Shard与Replica**:为了分散负载和提高性能,Elasticsearch将索引拆分为多个碎片(Shards)。每个碎片可以有副本(Replicas),以实现故障转移和负载均衡。 4. **安装过程**:下载Elasticsearch的安装包后,...
Elasticsearch 5.3.0 是一个流行的开源搜索引擎和分析引擎,主要使用Java编写,其源代码提供了深入了解其工作原理的机会。这个版本是Elasticsearch的重要里程碑,它包含了丰富的功能和性能优化,使得它在大数据处理...
- **Shard**: 分片是 Elasticsearch 的基本存储单元,源码中 `org.elasticsearch.index.shard` 包下的类如 `IndexShard` 和 `Translog` 描述了分片的生命周期管理和事务日志。 - **Replica**: `org.elasticsearch....
1. **健康检查**:Elasticsearch提供API来检查集群状态,包括节点、索引和 shard 的健康状况。 2. **性能优化**:包括JVM调优、硬件选择、索引生命周期管理等,6.1版本可能提供更精细的性能监控工具和指导。 **六...
**Elasticsearch 1.5.2 知识点详解** Elasticsearch 是一个开源的、分布式的全文搜索引擎,以其高效、可扩展性以及实时分析能力而被广泛应用于大数据处理和搜索领域。版本1.5.2是它的一个重要里程碑,提供了稳定的...
### ElasticSearch性能优化策略 ElasticSearch是一种广泛应用于日志分析、全文检索、实时数据分析等场景的搜索引擎。随着数据量的增大与业务复杂度的提高,如何高效地使用ElasticSearch变得尤为重要。本文将详细...
Elasticsearch(ES)是一款开源的全文搜索引擎,以其高效、可扩展和易用性闻名。本教程的源码"elasticsearch_demo.zip"是针对Elasticsearch的进阶学习提供的实践项目,旨在帮助开发者深入理解其核心概念和高级特性。...
Elasticsearch 是一款高度分布式、实时的搜索与分析引擎,常用于大数据的存储、索引和查询。其设计目标是实现高效、可扩展和高可用性。以下将详细阐述Elasticsearch的相关知识点。 首先,Elasticsearch的核心特性之...
Elasticsearch(简称ES)是一种基于Lucene的分布式、RESTful搜索分析引擎,广泛应用于日志分析、实时监控、数据搜索、全文检索等场景。它以其高效、灵活、可扩展的特点,深受Java开发者喜爱。以下将针对精选的7道...
Elasticsearch(简称ES)是一款开源的、基于Lucene的全文搜索引擎,由Java编写,并采用分布式、RESTful架构设计。它旨在提供快速、可扩展的近实时搜索功能,同时也支持数据分析和聚合操作。在大数据时代,Elastic...
Elasticsearch(简称ES)是一种基于Apache Lucene的开源全文搜索和分析引擎,它被设计为分布式的、可扩展的、实时的搜索和数据分析平台。作为一款强大的搜索引擎,Elasticsearch不仅提供了全文检索功能,还支持结构...
Elasticsearch通过Shards(分片)和Replicas(副本)实现高可用性和数据冗余,保证了系统的稳定性和数据的安全性。 **二、Springboot集成Elasticsearch** 在Springboot中集成Elasticsearch,主要依赖于`spring-...
1. **分片(Shard)**:分片是Elasticsearch分布式存储的基础。每个索引可以被分割为多个分片,每个分片都可以是主分片(Primary Shard)或副本分片(Replica Shard)。主分片是数据的原始来源,副本分片则用于备份和提供...
Elasticsearch(简称ES)是一款基于Lucene的开源全文搜索引擎,它以其分布式、实时、可扩展的特性,广泛应用于日志分析、监控、搜索、大数据分析等多个领域。"elasticsearch_tmp"可能是一个与Elasticsearch相关的...
### Elasticsearch集群健康值红色终极解决方案 #### 一、集群状态解读与健康颜色含义 在Elasticsearch中,集群状态的颜色指示器(绿色、黄色、红色)反映了集群整体的健康状况。这些颜色的意义如下: 1. **绿色**...
Elasticsearch(ES)是一种基于Lucene的分布式搜索引擎,它以高效、可扩展性和实时性著称。ES的核心原理在于其分布式特性和集群管理。在深入理解这些原理之前,我们首先需要了解几个基本概念。 **一、ES集群** ES...