Elasticsearch源码分析之二------索引过程源码概要分析 -

aoyouzi

浏览: 2006455 次
性别:
来自: 北京

最近访客更多访客>>

dy.f

zhaoshijie

lbq136957978

juzhibest

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Elasticsearch源码分析之二------索引过程源码概要分析

博客分类：

技术总结
搜索

Elasticsearch 索引过程

Elasticsearch源码分析之二------索引过程源码概要分析

索引逻辑简单分析，这里只是理清主要的脉络，一些细节方面以后的文章或会阐述。

假如通过java api来调用es的索引接口，先是构造成一个json串（es里表示为XContent，是对要处理的内容进行抽象），在IndexRequest里面指定要索引文档到那个索引库（index）、其类型（type）还有文档的id，如果没有指定文档的id，es会通过UUID工具自动生成一个uuid，代码在IndexRequest的process方法内。

[java]view plaincopy 
 
if (allowIdGeneration) {  
     if (id == null) {  
         id(UUID.randomBase64UUID());  
         opType(IndexRequest.OpType.CREATE);  
     }  
 }  

然后使用封装过netty的TransportService通过tcp协议发送请求到es服务器（rest的话就是通过http协议）。

服务器获得TransportAction后解析索引请求（TransportShardReplicationOperationAction）。到AsyncShardOperationAction.start()方法开始进行分片操作，先读取集群状态，把目标索引及其分片信息提取出来，根据索引数据的id、类型以及索引分片信息进行哈希取模，确定把该条数据分配到那个分片。

[java]view plaincopy 
 
private int shardId(ClusterState clusterState, String index, String type, @Nullable String id, @Nullable String routing) {  
     if (routing == null) {  
         if (!useType) {  
             return Math.abs(hash(id) % indexMetaData(clusterState, index).numberOfShards());  
         } else {  
             return Math.abs(hash(type, id) % indexMetaData(clusterState, index).numberOfShards());  
         }  
     }  
     return Math.abs(hash(routing) % indexMetaData(clusterState, index).numberOfShards());  
 }  

并找到数据要分配到的分片的主分片，先把索引请求提交到主分片处理（TransportIndexAction.shardOperationOnPrimary）。

判断是否必须要指定routing值

[java]view plaincopy 
 
MappingMetaData mappingMd = clusterState.metaData().index(request.index()).mappingOrDefault(request.type());  
  if (mappingMd != null && mappingMd.routing().required()) {  
      if (request.routing() == null) {  
          throw new RoutingMissingException(request.index(), request.type(), request.id());  
      }  
  }  

判断索引操作的类型，索引操作有两种，一种是INDEX，当要索引的文档id已经存在时，不会覆盖原来的文档，只是更新原来文档。一种是CREATE，当索引文档id存在时，会抛出该文档已存在的错误。

[java]view plaincopy 
 
if (request.opType() == IndexRequest.OpType.INDEX)   

调用InternalIndexShard进行索引操作

[java]view plaincopy 
 
Engine.Index index = indexShard.prepareIndex(sourceToParse)  
        .version(request.version())  
        .versionType(request.versionType())  
        .origin(Engine.Operation.Origin.PRIMARY);  
indexShard.index(index);  

通过（InternalIndexShard）查找与请求索引数据类型（type）相符的mapping。对要索引的json字符串进行解析，根据mapping转换为对应的解析结果ParsedDocument 。

[java]view plaincopy 
 
public Engine.Index prepareIndex(SourceToParse source) throws ElasticSearchException {  
    long startTime = System.nanoTime();  
    DocumentMapper docMapper = mapperService.documentMapperWithAutoCreate(source.type());  
    ParsedDocument doc = docMapper.parse(source);  
    return new Engine.Index(docMapper, docMapper.uidMapper().term(doc.uid()), doc).startTime(startTime);  
}  

最后调用RobinEngine中的相关方法(添加或修改)对底层lucene进行操作，这里是写入到lucene的内存索引中（RobinEngine.innerIndex）。

[java]view plaincopy 
 
if (currentVersion == -1) {  
       // document does not exists, we can optimize for create  
       if (index.docs().size() > 1) {  
           writer.addDocuments(index.docs(), index.analyzer());  
       } else {  
           writer.addDocument(index.docs().get(0), index.analyzer());  
       }  
   } else {  
       if (index.docs().size() > 1) {  
           writer.updateDocuments(index.uid(), index.docs(), index.analyzer());  
       } else {  
           writer.updateDocument(index.uid(), index.docs().get(0), index.analyzer());  
       }  
   }  

写入内存索引后还会写入到Translog（Translog是对索引的操作日志，会记录没有持久化的操作）中，防止flush前断电导致索引数据丢失。

[java]view plaincopy 
 
Translog.Location translogLocation = translog.add(new Translog.Create(create));  

主分片索引请求完就把请求发给副本进行索引操作。最后把成功信息返回给客户端。

分享到：

Elasticsearch Java虚拟机配置详解 | elasticsearch------索引修复

2014-10-11 12:25
浏览 1975
评论(0)
分类:行业应用
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Elasticsearch源码分析之二------索引过程源码概要分析

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Elasticsearch源码分析之二------索引过程源码概要分析

评论

发表评论

相关推荐

万字总结Java 9~15新特性

架构制图：工具与方法论

性能优化

【冬察冬见】FFmpeg系列学习笔记

有关创新的一些思考

浅谈面试官的培养

冬察冬见·全视角再议晋升

冬察冬见·晋升-晋升的那些事儿1

物联网MQTT实战

大小公司都适用的架构选型工具箱（涵盖上百个组件）

elasticsearch使用踩坑

【冬察冬见】读书日话高效读书

【冬察冬见·荐书】4·23世界读书日 80本书单推荐承包你一年的书单

快速上手 AB Test

优雅的微服务架构下的鉴权

知识图谱的构建

宜信微服务架构落地及其演进

MySQL性能优化神技

REST协议解密(原创)

大型互联网公司分布式ID方案总结

最近访客更多访客>>