solrCloud中的路由策略：DocRouter、CompositeIdRouter、ImplicitDocRouter -

suichangkele

浏览: 204505 次
性别:
来自: 北京

最近访客更多访客>>

jieyuan_cg

z9780420

jzhfmm

geeksun

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

solrCloud中的路由策略：DocRouter、CompositeIdRouter、ImplicitDocRouter

博客分类：

solr

DocRouter CompositeIdRouter ImplicitDocRouter

sorlCloud是分片的，那么如何决定一个document应该到哪一个分片呢？负责解决这个问题的东西就是DocRouter，翻译过来是doc路由器。在创建一个集合（collection）的时候，我们必须要给集合置顶一个docRouter，solr中默认是使用基于hash策略的docRouter（CompositeIdRouter），当然还有其他的Router，这个博客就要说这些。

我们先看一下DocRouter的源码，里面有很多的抽象方法，

public abstract Slice getTargetSlice(String id, SolrInputDocument sdoc, SolrParams params, DocCollection collection);

根据一个solrInputDocument判断应该属于一个collection的哪一个shard(slice)，用于添加document的时候,

public abstract Collection<Slice> getSearchSlicesSingle(String shardKey, SolrParams params, DocCollection collection);

这个方法是在查询的时候应该查那些shard，根据shardKey来判断。

public abstract boolean isTargetSlice(String id, SolrInputDocument sdoc, SolrParams params, String shardId, DocCollection collection);

这个是判断一个shardId是不是一个solrInputDocument的正确的slice。

DocRouter的作用就是体现在这些方法上，对于查询和增加document的时候分别调用不同的方法来决定要操作的那些shard。

我们看一下他的实现类，先看一下基于hash计算的：HashBasedRouter ，我们看一下这个类对上面的方法实现：

1、getTargetSlice:

 @Override
  public Slice getTargetSlice(String id, SolrInputDocument sdoc, SolrParams params, DocCollection collection) {
    if (id == null) id = getId(sdoc, params);//获得这个doc的id
    int hash = sliceHash(id, sdoc, params,collection);//根据id计算hash值，嗲用的是Hash.murmurhash3_x86_32(id, 0, id.length(), 0);方法，mermerHash。
    return hashToSlice(hash, collection);//根据hash值得到一个slice，看下面的方法
  }

protected Slice hashToSlice(int hash, DocCollection collection) {
    for (Slice slice : collection.getActiveSlices()) {//当前的集合所有存活的shard
      Range range = slice.getRange();//一个shard有一个范围，
      if (range != null && range.includes(hash)) return slice;//如果hash值在某个范围。
    }
    throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "No active slice servicing hash code " + Integer.toHexString(hash) + " in " + collection);//如果没有包含hash值的shard，则报错。从这个地方可以看出，基于hash值的分片的方式应该是不能动态的扩容的
也就是不能在建立好集群之后添加shard，因为各个Shard的范围应该基于创建的shard的个数被固定下来，所以不能动态的添加shard。
  }

从上面的方法中可以明白很多问题，比如基于hash值的路由策略的shard在建立的时候就会固定shard的范围，这样也就不能再动态添加shard了。

2、getSearchSliceSingle

 @Override
  public Collection<Slice> getSearchSlicesSingle(String shardKey, SolrParams params, DocCollection collection) {
    if (shardKey == null) {//如果在查询的时候没有指定shardKey，则查询所有的存活的shard，也就是如果某个shard已经死掉了，默认就是不会查询他。
      // search across whole collection
      // TODO: this may need modification in the future when shard splitting could cause an overlap
      return collection.getActiveSlices();
    }

    // use the shardKey as an id for plain hashing
    Slice slice = getTargetSlice(shardKey, null, params, collection);//如果指定了，则调用上面的getTargetSlice方法
    return slice == null ? Collections.<Slice>emptyList() : Collections.singletonList(slice);
  }

3、isTargetSlice方法很简单，这里就不展示了。

HashBasedRouter 仍然是抽象类，因为他没有指定range的实现方式以及和分片的个数的关系，他的实现类时CompositeIdRouter，我们看一下的他的partitionRange方法，在这个方法中一个集合根据分片的个数决定了每个分片的范围（hash值的范围），这个方法我还没有看懂，有兴趣的同学可以帮忙看看。

上面我们看完了基于hash值来分片的策略，他的缺点是不能再运行时添加shard，对于那些没有明显的规则的集合是合适的。

DocRouter的另一个实现：ImplicitDocRouter

这个是必须指定路由域路由策略，我们在创建集合的时候必须制定这个集合的路由的域是什么，然后根据document的这个域的值来判断这个document要添加到哪个shard中。我们看一下他的方法

@Override
  public Slice getTargetSlice(String id, SolrInputDocument sdoc, SolrParams params, DocCollection collection) {
    String shard = null;
    if (sdoc != null) {
      String f = getRouteField(collection);//得到要使用作为路由的域，这个在创建集合的时候就要指定
      if(f !=null) {
        Object o = sdoc.getFieldValue(f);//得到这个document的这个域的值
        if (o != null) shard = o.toString();//根据与的值对应shard的id
        else throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "No value for field "+f +" in " + sdoc);
      }
      if(shard == null) {//如果上面没有完成对shard的实现，则使用_ROUTE_这个域
        Object o = sdoc.getFieldValue(_ROUTE_);//使用_ROUTE_这个域
        if (o == null) o = sdoc.getFieldValue("_shard_");//deprecated . for backcompat remove later，如果没有_ROUTE_这个域，则使用_shard_这个域
        if (o != null) {
          shard = o.toString();
        }
      }
    }

    if (shard == null) {//如果上面从sdoc中没有找到，则从参数中
      shard = params.get(_ROUTE_);
      if(shard == null) shard =params.get("_shard_"); //deperecated for back compat
    }

    if (shard != null) {

      Slice slice = collection.getSlice(shard);//直接根据名字找slice
      if (slice == null) {
        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "No shard called =" + shard + " in " + collection);
      }
      return slice;
    }

    return null;  // no shard specified... use default.
  }

上面的代码可以看出，是先根据指定的域，如果没有指定，则使用_ROUTE_做路由。

getSearchSlicesSingle

@Override
  public Collection<Slice> getSearchSlicesSingle(String shardKey, SolrParams params, DocCollection collection) {

    if (shardKey == null) {//如果在查询的时候没有指定shardkey，则查询所有的存活的shard
      return collection.getActiveSlices();
    }

    // assume the shardKey is just a slice name
    Slice slice = collection.getSlice(shardKey);//如果指定了，则返回名字对应的shard
    if (slice == null) {
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "implicit router can't find shard " + shardKey + " in collection " + collection.getName());
    }

    return Collections.singleton(slice);
  }

这个路由策略的好处是可以在运行时动态的添加shard，对于document有明显的筛选条件的场合应该优先使用这个。

那么应该怎么创建这两种不同路由策略的集合呢？

如果在创建集合的时候没有指定router.name，则默认就是CompositeIdRouter，比如这个语句：admin/collections?action=CREATE&name=collectionName&numShards=4&replicationFactor=2&collection.configName=collectionName&maxShardsPerNode=2可以在创建玩了之后查看一下zk上的clusterstate.json,上面就有"router":{"name":"compositeId"}（solr4.7.2），

如果指定了router.name=implicit，则就是后者，比如这个语句：admin/collections?action=CREATE&name=hello&replicationFactor=2&collection.configName=configName&maxShardsPerNode=10&router.name=implicit&shards=name1,name2,name3,name4&router.field=nameField，就会是后者。

分享到：