Hlog的相关处理流程不完全分析 -

hongs_yang

浏览: 61727 次
性别:
来自: 西安

最近访客更多访客>>

jlbhdfsl

longlongkong

qq85609655

hsujamy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Hlog的相关处理流程不完全分析

博客分类：

hadoop
hbase
大数据

hbase 分布式源代码

Hlog的相关处理流程：

在对hbase中数据进行更新操作put/delete/append/increment操作时，记录操作日志供日志重播的相关处理。

Hlog的写入通过regionserver实例生成时生成的FSLog为的实例。

通过LogRoller线程定期去检查并删除过期的日志文件。

定期检查的时间间隔通过hbase.regionserver.logroll.period进行配置，默认为3600000ms

如果检查时间还没有达到上面的间隔时间时，线程等待的wake(唤醒)时间，hbase.server.thread.wakefrequency，

默认为10*1000ms

FSLog实例内部有一个FSLog.LogSyncer线程实例，并启动此实例。此线程主要用来把log写入到hdfs中

此线程的执行间隔通过hbase.regionserver.optionallogflushinterval配置，默认为1000ms

如果此值配置为小于0的值，表示实时写入hlog日志

在执行更新操作时如果开启有日志记录，调用appendNoSync-->append写入日志,

每一个日志中记录的seqid默认第一次时为当前rs中所有region中最大的seqid加一(FSHLog.logSeqNum)，

每append一次后，logSeqNum的值为加一。同时此值也是flsuh时hfile中的fileinfo中记录的最大的seqid值。

此方法把要记录的日志写入到FSLog.LogSyncer.pendingWrites队列中。等待LogSyncer的run方法去sync

注意：如果是对meta的log写入时，每一次写入都会执行sync操作，保证meta数据的不丢失。

publicvoidrun() {

try {

// awaiting with a timeout doesn't always

// throw exceptions on interrupt

while(!this.isInterrupted() && !closeLogSyncer.get()) {

try {

如果没有最新的数据要写入到HDFS，现在未flush的log的值小于或等于上一次提交的值，表示没有新记录

unflushedEntries在每append一次值会加一。

if (unflushedEntries.get() <= syncedTillHere) {

synchronized (closeLogSyncer) {

线程等待

closeLogSyncer.wait(this.optionalFlushInterval);

}

// Calling sync since we waited or had unflushed entries.

// Entries appended but not sync'd are taken care of here AKA

// deferred log flush

否则，执行log的写入HDFS操作。使用unflushedEntries的值当txid

sync();

} catch (IOException e) {

LOG.error("Error while syncing, requesting close of hlog ", e);

requestLogRoll();

Threads.sleep(this.optionalFlushInterval);

}

} catch (InterruptedException e) {

LOG.debug(getName() + " interrupted while waiting for sync requests");

} finally {

LOG.info(getName() + " exiting");

}

使用unflushedEntries的值当成txid

private void syncer() throws IOException {

syncer(this.unflushedEntries.get()); // sync all pending items

}

private void syncer(long txid) throws IOException {

// if the transaction that we are interested in is already

// synced, then return immediately.

检查日志是否有更新，如果没有，直接返回，减少检查过程

if (txid <= this.syncedTillHere) {

return;

}

WritertempWriter;

synchronized (this.updateLock) {

if (this.closed) return;

// Guaranteed non-null.

// Note that parallel sync can close tempWriter.

// The current method of dealing with this is to catch exceptions.

// See HBASE-4387, HBASE-5623, HBASE-7329.

tempWriter = this.writer;

}

try {

longdoneUpto;

longnow = EnvironmentEdgeManager.currentTimeMillis();

// First flush all the pending writes to HDFS. Then

// issue the sync to HDFS. If sync is successful, then update

// syncedTillHere to indicate that transactions till this

// number has been successfully synced.

IOException ioe = null;

List<Entry> pending = null;

synchronized (flushLock) {

if (txid <= this.syncedTillHere) {

return;

}

得到当前未flush的最新的txid

doneUpto = this.unflushedEntries.get();

得到需要写入到HDFS的日志记录列表

pending = logSyncer.getPendingWrites();

try {

添加到HLOG的写入OUTPUT中

logSyncer.hlogFlush(tempWriter, pending);

postAppend(pending);

} catch(IOException io) {

ioe = io;

LOG.error("syncer encountered error, will retry. txid=" + txid, ioe);

}

if (ioe != null && pending != null) {

synchronized (this.updateLock) {

synchronized (flushLock) {

// HBASE-4387, HBASE-5623, retry with updateLock held

tempWriter = this.writer;

logSyncer.hlogFlush(tempWriter, pending);

postAppend(pending);

}

// another thread might have sync'ed avoid double-sync'ing

if (txid <= this.syncedTillHere) {

return;

}

try {

if (tempWriter != null) {

tempWriter.sync();

postSync();

}

} catch(IOException ex) {

synchronized (this.updateLock) {

// HBASE-4387, HBASE-5623, retry with updateLock held

// TODO: we don't actually need to do it for concurrent close - what is the point

// of syncing new unrelated writer? Keep behavior for now.

tempWriter = this.writer;

if (tempWriter != null) {

执行HDFS的文件写入sync,表示数据持久化

tempWriter.sync();

postSync();

}

更新当前syncedTillHere的值为unflushedEntries的值，主要用来检查下次是否需要更新记录

this.syncedTillHere = Math.max(this.syncedTillHere, doneUpto);

this.metrics.finishSync(EnvironmentEdgeManager.currentTimeMillis() - now);

// TODO: preserving the old behavior for now, but this check is strange. It's not

// protected by any locks here, so for all we know rolling locks might start

// as soon as we enter the "if". Is this best-effort optimization check?

if (!this.logRollRunning) {

checkLowReplication();

try {

如果当前HLOG文件的大小超过了指定的log文件大小，

通过hbase.regionserver.hlog.blocksize配置,默认hdfs的blocksize大小 *

hbase.regionserver.logroll.multiplier的值,默认0.95

curLogSize = tempWriter.getLength();

if (curLogSize > this.logrollsize) {

调用LogRoller.logRollRequested方法

requestLogRoll();

}

} catch (IOException x) {

LOG.debug("Log roll failed and will be retried. (This is not an error)");

}

} catch (IOException e) {

LOG.fatal("Could not sync. Requesting roll of hlog", e);

requestLogRoll();

throwe;

}

LogRoller.logRollRequested方法流程：

此方法主要用来叫醒LogRoller线程本身。

public void logRollRequested() {

synchronized (rollLog) {

rollLog.set(true);

rollLog.notifyAll();

}

LogRoller.run方法：

public void run() {

while (!server.isStopped()) {

longnow = System.currentTimeMillis();

booleanperiodic = false;

如果是Hlog写入的时候显示调用了 logRollRequested方法，下面的if 不进去

if (!rollLog.get()) {

检查上次更新时间是否大过更新间隔时间，如果没有，线程等待

通过hbase.regionserver.logroll.period进行配置，默认为3600000ms

periodic = (now - this.lastrolltime) > this.rollperiod;

如果是定期检查，同时上一次roll log的时间还不到间隔的时间，线程等待

线程等待超时通过hbase.server.thread.wakefrequency，默认为10*1000ms

if (!periodic) {

synchronized (rollLog) {

try {

rollLog.wait(this.threadWakeFrequency);

} catch (InterruptedException e) {

// Fall through

}

continue;

}

// Time for periodic roll

if (LOG.isDebugEnabled()) {

LOG.debug("Hlog roll period " + this.rollperiod + "ms elapsed");

}

} elseif (LOG.isDebugEnabled()) {

LOG.debug("HLog roll requested");

}

rollLock.lock(); // FindBugs UL_UNRELEASED_LOCK_EXCEPTION_PATH

try {

更新最后一次roll log的时间为当前时间

this.lastrolltime = now;

// This is array of actual region names.

调用FSHLog.rollWriter方法，检查文件个数是否达到指定的值，

如果是:把所有region已经flush的hlog文件移动到oldWALs目录下。

并取出未移动同时已经关闭writer的hlog file的最小的一个seqid的hlog path,

把所有未flush的region中seqid小于此值的region返回回来。

byte [][] regionsToFlush = getWAL().rollWriter(rollLog.get());

if (regionsToFlush != null) {

发起flush请求，强制所有的region进行flush操作。

for (byte [] r: regionsToFlush) scheduleFlush(r);

}

} catch (FailedLogCloseException e) {

server.abort("Failed log close in log roller", e);

} catch (java.net.ConnectException e) {

server.abort("Failed log close in log roller", e);

} catch (IOException ex) {

// Abort if we get here. We probably won't recover an IOE. HBASE-1132

server.abort("IOE in log roller",

RemoteExceptionHandler.checkIOException(ex));

} catch (Exception ex) {

LOG.error("Log rolling failed", ex);

server.abort("Log rolling failed", ex);

} finally {

try {

rollLog.set(false);

} finally {

rollLock.unlock();

}

LOG.info("LogRoller exiting.");

}

FSHLog.rollWriter方法流程分析：

把所有的region中已经flush完成的最小的seqid与old的hlog文件进行比对，

如果old hlog的seqid小于指定的flushid把这些hlog移动到oldWALs目录下去。

如果关闭的hlog文件大于hbase.regionserver.maxlogs配置的的值，默认为32个，

返回所有的已经关闭的hlog中所有region未进行flush的seqid,

publicbyte [][] rollWriter(booleanforce)

throws FailedLogCloseException, IOException {

synchronized (rollWriterLock) {

// Return if nothing to flush.

FSHLog中的numEntries的值每次append时，都会加一

if (!force && this.writer != null && this.numEntries.get() <= 0) {

returnnull;

}

byte [][] regionsToFlush = null;

if (closed) {

LOG.debug("HLog closed. Skipping rolling of writer");

returnnull;

}

try {

....................此处部分代码没有显示

filenum在第一次是值为-1，第一次进行入是在FSHLog实例生成时通过此方法得到Writer实例

longcurrentFilenum = this.filenum;

Path oldPath = null;

如果是第二次指定此方法，也就是LogRoller显示调用时

if (currentFilenum > 0) {

//computeFilename will take care of meta hlog filename

得到上一个hlog文件的路径，也就是准备关闭的文件路径

oldPath = computeFilename(currentFilenum);

}

生成一个新的HLOG文件名称，并创建文件路径

this.filenum = System.currentTimeMillis();

Path newPath = computeFilename();

while (fs.exists(newPath)) {

this.filenum++;

newPath = computeFilename();

}

....................此处部分代码没有显示

生成一个新的Writer实例，默认实现是ProtobufLogWriter类,通过hbase.regionserver.hlog.writer.impl配置

FSHLog.WriternextWriter = this.createWriterInstance(fs, newPath, conf);

// Can we get at the dfsclientoutputstream?

FSDataOutputStream nextHdfsOut = null;

if (nextWriterinstanceof ProtobufLogWriter) {

得到writer对应的HDFS路径的OutputStream

nextHdfsOut = ((ProtobufLogWriter)nextWriter).getStream();

// perform the costly sync before we get the lock to roll writers.

try {

nextWriter.sync();

} catch (IOException e) {

// optimization failed, no need to abort here.

LOG.warn("pre-sync failed", e);

}

Path oldFile = null;

intoldNumEntries = 0;

synchronized (updateLock) {

// Clean up current writer.

oldNumEntries = this.numEntries.get();

关闭上一个HLOG在HDFS的文件流，并把writer设置为null

同时把关闭的hlog文件的的seqid与路径写入到outputfiles的map列表中

oldFile = cleanupCurrentWriter(currentFilenum);

把当前新的hlog文件的writer与outputstream设置给FSHLOG实例

this.writer = nextWriter;

this.hdfs_out = nextHdfsOut;

this.numEntries.set(0);

}

....................此处部分代码没有显示

// Can we delete any of the old log files?

如果outputfiles列表中有值，检查log的个数是否达到指定的个数，

if (getNumRolledLogFiles() > 0) {

检查log目录下的文件个数是否达到指定的个数，hbase.regionserver.maxlogs配置的的值，默认为32个

如果达到指定的个数，找到所有region中已经flush完成的最小的一个seqid,

并从outputfiles列表中取出比此seqid小的hlog文件，把这些文件移动到oldWALs路径下。

CleanOldLogs();

取出outputfiles列表中的最小的一个seqid,

检查如果未flush的region的seqid是否小于此seqid，如果是返回这些region

regionsToFlush = getRegionsToForceFlush();

}

} finally {

this.logRollRunning = false;

closeBarrier.endOp();

}

returnregionsToFlush;

}

0
顶

0
踩

分享到：

memstore的flush流程分析 | hbase put 流程分析regionserver端

2014-04-15 17:59
浏览 3005
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hlog的相关处理流程不完全分析

Hlog的相关处理流程：

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hlog的相关处理流程不完全分析

Hlog的相关处理流程：

评论

发表评论

相关推荐

关于Hbase的cache配置

hadoop ha配置

hadoop-mapreduce中reducetask运行分析

hadoop-mapreduce中maptask运行分析

hbase hfilev2文件

Hbase MemStoreLAB

spark shuffle部分分析

Task的执行过程分析

Spark中的Scheduler

RDD的依赖关系

从wordcount分析spark提交job

UserScan的处理流程分析

Major compaction时的scan操作

minor compaction时的scan操作分析

compact处理流程分析

region split流程分析

memstore的flush流程分析

hbase put 流程分析regionserver端

hbase put 流程分析client端

日志重播分析

最近访客更多访客>>