hbase put 流程分析client端 -

hongs_yang

浏览: 61728 次
性别:
来自: 西安

最近访客更多访客>>

jlbhdfsl

longlongkong

qq85609655

hsujamy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

hbase put 流程分析client端

博客分类：

hadoop
hbase
大数据

hbase 源代码分布式

数据写入(Put)处理流程分析:

Put通过生成一个HTable实例,并调用其put方法时，的执行流程，此部分分析分为client与regionserver两个部分，

client端:

Htable.put-->doPut,如果是put一个list时，会迭代调用doPut

privatevoiddoPut(Put put) throws InterruptedIOException,

RetriesExhaustedWithDetailsException {

检查上次提交是否出错,如果上次提交有错误，把这次的put添加到writeAsyncBuffer列表中。

并执行flush操作,等待flush完成操作

if (ap.hasError()){

writeAsyncBuffer.add(put);

backgroundFlushCommits(true);

}

对put的内容进行检查：

1.检查put中是否指定cf,如果一个都没有，检查不合法

2.检查put中所有的kv中每一个kv的大小是否超过hbase.client.keyvalue.maxsize配置的值，默认-1表示不限制大小

validatePut(put);

把当前put的所有kv的大小，包含类定义大小加入到currentWriteBufferSize中，

此属性用来检查当前table的buffer中的put大小是否超出了指定的buffersize

currentWriteBufferSize += put.heapSize();

把这次的put添加到writeAsyncBuffer列表中。

writeAsyncBuffer.add(put);

如果当前buffer中的put总大小超过了指定的table可存储的buffer大小时，进行flush,不等待flush完成操作。

在flush过程中有可能writeAsyncBuffer的数据清空后submit出现错误，会把错误的put重新放入到此列表中。

while (currentWriteBufferSize > writeBufferSize) {

backgroundFlushCommits(false);

}

hbase.client.max.total.tasks,default=100

privatevoidbackgroundFlushCommits(booleansynchronous) throws

InterruptedIOException, RetriesExhaustedWithDetailsException {

try {

do {

把writeAsyncBuffer列表中的数据通过RPC调用regionserver的multi提交数据

提交过程中出现错误会throw InterruptedIOException

1.迭代并取出writeAsyncBuffer中的每一个put实例，与meta region所在的server创建rpc连接，

并从meta表中得到当前迭代的put对象row所在的HRegionLocation,

如果regionlocation获取失败,设置haserror=true,

在获取HRegionLocation时，先在cache中看是否能找到此regionLocation,

如果不能找到先得到meta regionlocation，

生成meta rs的rpc连接 ClientProtos.ClientService.BlockingInterface接口实现(HRegionServer)

通过client.get从meta region中得到当前put的row对应的region info,生成HRegionLocation,并添加到cache

2.通过hbase.client.max.perregion.tasks配置单个region在client的并发数，默认为1

3.通过hbase.client.max.perserver.tasks配置单个regionserver在client的并发数，默认为2

4.通过hbase.client.max.total.tasks配置所有regionserver最大的并发连接任务个数，默认为100

检查任务数是否超过总可执行的任务数是通过tasksSent-tasksDone得到。

5.现在得到所有要put的数据对应的HRegionLocation列表，把每一个regionLocation对应要put的数据生成到

一个map<HRegionLocation,List<Row>>的集合列表中。

生成一个Action<Row>时，会相应的移出HTable.writeAsyncBuffer中对应的put实例。

在同一个regionserver中的所有region,它们的HRegionLocation的equals都相同。

6.生成HConnectionManager.ServerErrorTracker实例，此实例

a.通过hbase.client.retries.number配置server的重试次数，默认为31次

b.通过hbase.client.pause配置每次重试的间隔时间，默认100，每重试一次，超时时间会有相应延长.

7.针对每一个要提交的regionLocation(每一个region可能包含多个region),

a.把tasksSent的值加一,表示总任务数加一，

b.把regionLocation对应的regionserver，taskCounterPerServer的值加一，表示此server的总任务加一

c.得到regionLocation中所有的region,把每一个region的任务数加一，taskCounterPerRegion值加一。

8.针对每一个regionLocation(每一个regionserver)创建一个rpc连接，并开始一个线程。

通过MultiServerCallable.call方法调用regionserver.multi方法添加数据。

9.每一个线程都提交rs后，等待rs的响应，如果提交失败，进行重试，直接timeout或重试次数到一定次数。

timeout在HConnectionManager.ServerErrorTracker实例生成时生成，具体请查看源代码。

每重试一次，timeout都会有相应的延长

10.根据每一个提交过去数据对应的region,把每一个region的taskCounterPerRegion的值减一

把 taskCounterPerServer的值减一

把tasksDone的值加一，表示完成一个任务，并把tasksDone进行notify.

Notify的目的是通知其它put的submit任务结束等待，

submit方法最开始会检查是否等待，如果是tasksDone会wait。具体请参见AsyncProcess.submit源代码。

ap.submit(writeAsyncBuffer, true);

} while (synchronous && !writeAsyncBuffer.isEmpty());

如果传入的参数是true,表示需要等待rpc调用结束,flushCommits或put中上一次提交error时此参数为true

等到tasksSent的值减去tasksDone的值等于0,tasksSent表示提交的任务数，tasksDone表示完成的任务数

if (synchronous) {

ap.waitUntilDone();

}

部分数据提交失败，也就是可能同时提交给两个regionserver,有一个成功，一个失败。

if (ap.hasError()) {

LOG.debug(tableName + ": One or more of the operations have failed -" +

" waiting for all operation in progress to finish (successfully or not)");

while (!writeAsyncBuffer.isEmpty()) {

ap.submit(writeAsyncBuffer, true);

}

等到tasksSent的值减去tasksDone的值等于0,tasksSent表示提交的任务数，tasksDone表示完成的任务数

ap.waitUntilDone();

如果有部分数据提交失败，同时没有设置清空失败的数据时，把数据重新添加到writeAsyncBuffer列表中

if (!clearBufferOnFail) {

// if clearBufferOnFailed is not set, we're supposed to keep the failed operation in the

// write buffer. This is a questionable feature kept here for backward compatibility

writeAsyncBuffer.addAll(ap.getFailedOperations());

}

RetriesExhaustedWithDetailsException e = ap.getErrors();

ap.clearErrors();

throwe;

}

} finally {

清空当前 currentWriteBufferSize的大小，如果有数据没有提交成功，

重新把未提交的数据的大小计算起来添加到 currentWriteBufferSize中。

currentWriteBufferSize = 0;

for (Rowmut : writeAsyncBuffer) {

if (mutinstanceofMutation) {

currentWriteBufferSize += ((Mutation) mut).heapSize();

}

0
顶

1
踩

分享到：

hbase put 流程分析regionserver端 | 日志重播分析

2014-04-14 16:19
浏览 2792
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hbase put 流程分析client端

数据写入(Put)处理流程分析:

client端:

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hbase put 流程分析client端

数据写入(Put)处理流程分析:

client端:

评论

发表评论

相关推荐

关于Hbase的cache配置

hadoop ha配置

hadoop-mapreduce中reducetask运行分析

hadoop-mapreduce中maptask运行分析

hbase hfilev2文件

Hbase MemStoreLAB

spark shuffle部分分析

Task的执行过程分析

Spark中的Scheduler

RDD的依赖关系

从wordcount分析spark提交job

UserScan的处理流程分析

Major compaction时的scan操作

minor compaction时的scan操作分析

compact处理流程分析

region split流程分析

memstore的flush流程分析

Hlog的相关处理流程不完全分析

hbase put 流程分析regionserver端

日志重播分析

最近访客更多访客>>