HMaster启动源码分析 -

blackproof

浏览: 1401743 次
性别:
来自: 北京

最近访客更多访客>>

lingxiajiudu

youtao531

mengjingwo

xuycan

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

HMaster启动源码分析

博客分类：

hbase

hbase hmaster 源码分析

写之前先吐槽一下自己的sb公司环境，电脑上不了网，优盘又不能插。所以做点笔记基本上都是晚上回家再写一遍。哎，废话不说了

先贴个hbase在构造函数中起来的RPC服务的UML图：http://blackproof.iteye.com/blog/2029170

HMaster启动会调用Run方法，概述为

HBase启动

hmaster处理备份master
进入HMaster的启动过程：方法：
  1.Zk监控类
  2.创建线程池，并启动60010jetty服务
  3.Regionserver启动
  4.HLog处理流程
  5.HMaster注册root和meta表
  6.准备步骤，收集offline server including region和zk assigned region
  7.分配region
  8.检查子region
  9.启动balance线程

详细分析：

调用方法becomeActiveMaster，先检查配置项"hbase.master.backup"，自己是否backup机器，如果是则直接block直至检查到系统中的active master挂掉(默认每3分钟检查一次)

  private boolean becomeActiveMaster(MonitoredTask startupStatus)
  throws InterruptedException {
    // TODO: This is wrong!!!! Should have new servername if we restart ourselves,
    // if we come back to life.
    this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,
        this);
    this.zooKeeper.registerListener(activeMasterManager);
    stallIfBackupMaster(this.conf, this.activeMasterManager);

    // The ClusterStatusTracker is setup before the other
    // ZKBasedSystemTrackers because it's needed by the activeMasterManager
    // to check if the cluster should be shutdown.
    this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
    this.clusterStatusTracker.start();
    return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus);
  }

1) Zk监控类

首先初始化所有zk监控类，调用方法initializeZKBasedSystemTrackers

初始化zk listener

1.createCatalogTracker（rootRegionTracker,metaRegionTracker）是root和meta的监控类

2. AssignmentManager 管理分配region

3. RegionServerManager 监控regionserver，服务已死的server

4.DrainingServerTracker 服务维护draining RS list

void initializeZKBasedSystemTrackers() throws IOException,
      InterruptedException, KeeperException {
    this.catalogTracker = createCatalogTracker(this.zooKeeper, this.conf, this);
    this.catalogTracker.start();

    this.balancer = LoadBalancerFactory.getLoadBalancer(conf);
    this.loadBalancerTracker = new LoadBalancerTracker(zooKeeper, this);
    this.loadBalancerTracker.start();
    this.assignmentManager = new AssignmentManager(this, serverManager,
      this.catalogTracker, this.balancer, this.executorService, this.metricsMaster,
      this.tableLockManager);
    zooKeeper.registerListenerFirst(assignmentManager);

    this.regionServerTracker = new RegionServerTracker(zooKeeper, this,
        this.serverManager);
    this.regionServerTracker.start();

    this.drainingServerTracker = new DrainingServerTracker(zooKeeper, this,
      this.serverManager);
    this.drainingServerTracker.start();

    // Set the cluster as up.  If new RSs, they'll be waiting on this before
    // going ahead with their startup.
    boolean wasUp = this.clusterStatusTracker.isClusterUp();
    if (!wasUp) this.clusterStatusTracker.setClusterUp();

    LOG.info("Server active/primary master=" + this.serverName +
        ", sessionid=0x" +
        Long.toHexString(this.zooKeeper.getRecoverableZooKeeper().getSessionId()) +
        ", setting cluster-up flag (Was=" + wasUp + ")");

    // create the snapshot manager
    this.snapshotManager = new SnapshotManager(this, this.metricsMaster);
  }

2）创建线程池，并启动60010jetty服务

1.启动线程池，和以下线程

MASTER_META_SERVER_OPERATIONS

MASTER_SERVER_OPERATIONS

MASTER_CLOSE_REGION

MASTER_OPEN_REGION

MASTER_TABLE_OPERATIONS

2.LogCleaner线程清除.oldlog目录

3.60010 jetty 服务

 void startServiceThreads() throws IOException{
   // Start the executor service pools
   this.executorService.startExecutorService(ExecutorType.MASTER_OPEN_REGION,
      conf.getInt("hbase.master.executor.openregion.threads", 5));
   this.executorService.startExecutorService(ExecutorType.MASTER_CLOSE_REGION,
      conf.getInt("hbase.master.executor.closeregion.threads", 5));
   this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,
      conf.getInt("hbase.master.executor.serverops.threads", 5));
   this.executorService.startExecutorService(ExecutorType.MASTER_META_SERVER_OPERATIONS,
      conf.getInt("hbase.master.executor.serverops.threads", 5));
   this.executorService.startExecutorService(ExecutorType.M_LOG_REPLAY_OPS,
      conf.getInt("hbase.master.executor.logreplayops.threads", 10));

   // We depend on there being only one instance of this executor running
   // at a time.  To do concurrency, would need fencing of enable/disable of
   // tables.
   this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS, 1);

   // Start log cleaner thread
   String n = Thread.currentThread().getName();
   int cleanerInterval = conf.getInt("hbase.master.cleaner.interval", 60 * 1000);
   this.logCleaner =
      new LogCleaner(cleanerInterval,
         this, conf, getMasterFileSystem().getFileSystem(),
         getMasterFileSystem().getOldLogDir());
         Threads.setDaemonThreadRunning(logCleaner.getThread(), n + ".oldLogCleaner");

   //start the hfile archive cleaner thread
    Path archiveDir = HFileArchiveUtil.getArchivePath(conf);
    this.hfileCleaner = new HFileCleaner(cleanerInterval, this, conf, getMasterFileSystem()
        .getFileSystem(), archiveDir);
    Threads.setDaemonThreadRunning(hfileCleaner.getThread(), n + ".archivedHFileCleaner");

    // Start the health checker
    if (this.healthCheckChore != null) {
      Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), n + ".healthChecker");
    }

    // Start allowing requests to happen.
    this.rpcServer.openServer();
    this.rpcServerOpen = true;
    if (LOG.isTraceEnabled()) {
      LOG.trace("Started service threads");
    }
  }

3）Regionserver启动

serverManager调用方法：waitForRegionServers

等待regionserver启动，在满足一下条件后继续启动hmaster：

a 至少等待4.5s("hbase.master.wait.on.regionservers.timeout")

b 成功启动regionserver节点数>=1("hbase.master.wait.on.regionservers.mintostart")

c 1.5s内没有regionsever死掉或新启动("hbase.master.wait.on.regionservers.interval")

public void waitForRegionServers(MonitoredTask status)
  throws InterruptedException {
    final long interval = this.master.getConfiguration().
      getLong(WAIT_ON_REGIONSERVERS_INTERVAL, 1500);
    final long timeout = this.master.getConfiguration().
      getLong(WAIT_ON_REGIONSERVERS_TIMEOUT, 4500);
    int minToStart = this.master.getConfiguration().
      getInt(WAIT_ON_REGIONSERVERS_MINTOSTART, 1);
    if (minToStart < 1) {
      LOG.warn(String.format(
        "The value of '%s' (%d) can not be less than 1, ignoring.",
        WAIT_ON_REGIONSERVERS_MINTOSTART, minToStart));
      minToStart = 1;
    }
    int maxToStart = this.master.getConfiguration().
      getInt(WAIT_ON_REGIONSERVERS_MAXTOSTART, Integer.MAX_VALUE);
    if (maxToStart < minToStart) {
        LOG.warn(String.format(
            "The value of '%s' (%d) is set less than '%s' (%d), ignoring.",
            WAIT_ON_REGIONSERVERS_MAXTOSTART, maxToStart,
            WAIT_ON_REGIONSERVERS_MINTOSTART, minToStart));
        maxToStart = Integer.MAX_VALUE;
    }

    long now =  System.currentTimeMillis();
    final long startTime = now;
    long slept = 0;
    long lastLogTime = 0;
    long lastCountChange = startTime;
    int count = countOfRegionServers();
    int oldCount = 0;
    while (
      !this.master.isStopped() &&
        count < maxToStart &&
        (lastCountChange+interval > now || timeout > slept || count < minToStart)
      ){

      // Log some info at every interval time or if there is a change
      if (oldCount != count || lastLogTime+interval < now){
        lastLogTime = now;
        String msg =
          "Waiting for region servers count to settle; currently"+
            " checked in " + count + ", slept for " + slept + " ms," +
            " expecting minimum of " + minToStart + ", maximum of "+ maxToStart+
            ", timeout of "+timeout+" ms, interval of "+interval+" ms.";
        LOG.info(msg);
        status.setStatus(msg);
      }

      // We sleep for some time
      final long sleepTime = 50;
      Thread.sleep(sleepTime);
      now =  System.currentTimeMillis();
      slept = now - startTime;

      oldCount = count;
      count = countOfRegionServers();
      if (count != oldCount) {
        lastCountChange = now;
      }
    }

    LOG.info("Finished waiting for region servers count to settle;" +
      " checked in " + count + ", slept for " + slept + " ms," +
      " expecting minimum of " + minToStart + ", maximum of "+ maxToStart+","+
      " master is "+ (this.master.isStopped() ? "stopped.": "running.")
    );
  }

4）HLog处理流程（贴出代码是处理流程部分代码）

处理目的是为了找到没有server归属的HLog，并将其付给其他的region server

masterfilesystem 调用 splitLogAfterStartup方法

1.split加锁

2.获得所有HLog与online region server做碰撞，没有碰上的Hlog进入splitLog方法

3.创建HLogSplitter，等待saftmode，调用HlogSplitter.splitLog,

4.HLogSplitter的reader读取hlog到内存中EntryBuffers，读取完毕移动log，有问题的到.corrupt目录中，处理完的放在.oldlogs中

4.HLogSplitter的 OutputSink 创建多个writer线程，读取entryBuffers写到region下的recovered.edits下的文件夹

5.解锁

this.splitLogLock.lock();
        try {              
          HLogSplitter splitter = HLogSplitter.createLogSplitter(
            conf, rootdir, logDir, oldLogDir, this.fs);
          try {
            // If FS is in safe mode, just wait till out of it.
            FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));
            splitter.splitLog();
          } catch (OrphanHLogAfterSplitException e) {
            LOG.warn("Retrying splitting because of:", e);
            //An HLogSplitter instance can only be used once.  Get new instance.
            splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
              oldLogDir, this.fs);
            splitter.splitLog();
          }
          splitTime = splitter.getTime();
          splitLogSize = splitter.getSize();
        } finally {
          this.splitLogLock.unlock();
        }

private List<Path> splitLog(final FileStatus[] logfiles) throws IOException {
    List<Path> processedLogs = new ArrayList<Path>();
    List<Path> corruptedLogs = new ArrayList<Path>();
    List<Path> splits = null;

    boolean skipErrors = conf.getBoolean("hbase.hlog.split.skip.errors", true);

    countTotalBytes(logfiles);
    splitSize = 0;

    outputSink.startWriterThreads(entryBuffers);

    try {
      int i = 0;
      for (FileStatus log : logfiles) {
       Path logPath = log.getPath();
        long logLength = log.getLen();
        splitSize += logLength;
        logAndReport("Splitting hlog " + (i++ + 1) + " of " + logfiles.length
            + ": " + logPath + ", length=" + logLength);
        Reader in;
        try {
          in = getReader(fs, log, conf, skipErrors);
          if (in != null) {
            parseHLog(in, logPath, entryBuffers, fs, conf, skipErrors);
            try {
              in.close();
            } catch (IOException e) {
              LOG.warn("Close log reader threw exception -- continuing",
                  e);
            }
          }
          processedLogs.add(logPath);
        } catch (CorruptedLogFileException e) {
          LOG.info("Got while parsing hlog " + logPath +
              ". Marking as corrupted", e);
          corruptedLogs.add(logPath);
          continue;
        }
      }
      status.setStatus("Log splits complete. Checking for orphaned logs.");
      
      if (fs.listStatus(srcDir).length > processedLogs.size()
          + corruptedLogs.size()) {
        throw new OrphanHLogAfterSplitException(
            "Discovered orphan hlog after split. Maybe the "
            + "HRegionServer was not dead when we started");
      }
    } finally {
      status.setStatus("Finishing writing output logs and closing down.");
      splits = outputSink.finishWritingAndClose();
    }
    status.setStatus("Archiving logs after completed split");
    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
    return splits;
  }

5）HMaster注册root和meta表

int assignRootAndMeta(MonitoredTask status)
  throws InterruptedException, IOException, KeeperException {
    int assigned = 0;
    long timeout = this.conf.getLong("hbase.catalog.verification.timeout", 1000);

    // Work on ROOT region.  Is it in zk in transition?
    status.setStatus("Assigning ROOT region");
    boolean rit = this.assignmentManager.
      processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
    ServerName currentRootServer = null;
    boolean rootRegionLocation = catalogTracker.verifyRootRegionLocation(timeout);
    if (!rit && !rootRegionLocation) {
      currentRootServer = this.catalogTracker.getRootLocation();
      splitLogAndExpireIfOnline(currentRootServer);
      this.assignmentManager.assignRoot();
      waitForRootAssignment();
      assigned++;
    } else if (rit && !rootRegionLocation) {
      waitForRootAssignment();
      assigned++;
    } else {
      // Region already assigned. We didn't assign it. Add to in-memory state.
      this.assignmentManager.regionOnline(HRegionInfo.ROOT_REGIONINFO,
          this.catalogTracker.getRootLocation());
    }
    // Enable the ROOT table if on process fail over the RS containing ROOT
    // was active.
    enableCatalogTables(Bytes.toString(HConstants.ROOT_TABLE_NAME));
    LOG.info("-ROOT- assigned=" + assigned + ", rit=" + rit +
      ", location=" + catalogTracker.getRootLocation());

    // Work on meta region
    status.setStatus("Assigning META region");
    rit = this.assignmentManager.
      processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO);
    boolean metaRegionLocation = this.catalogTracker.verifyMetaRegionLocation(timeout);
    if (!rit && !metaRegionLocation) {
      ServerName currentMetaServer =
        this.catalogTracker.getMetaLocationOrReadLocationFromRoot();
      if (currentMetaServer != null
          && !currentMetaServer.equals(currentRootServer)) {
        splitLogAndExpireIfOnline(currentMetaServer);
      }
      assignmentManager.assignMeta();
      enableSSHandWaitForMeta();
      assigned++;
    } else if (rit && !metaRegionLocation) {
      enableSSHandWaitForMeta();
      assigned++;
    } else {
      // Region already assigned.  We didnt' assign it.  Add to in-memory state.
      this.assignmentManager.regionOnline(HRegionInfo.FIRST_META_REGIONINFO,
        this.catalogTracker.getMetaLocation());
    }
    enableCatalogTables(Bytes.toString(HConstants.META_TABLE_NAME));
    LOG.info(".META. assigned=" + assigned + ", rit=" + rit +
      ", location=" + catalogTracker.getMetaLocation());
    status.setStatus("META and ROOT assigned.");
    return assigned;
  }

6)准备步骤，收集offline server including region和zk assigned region

hmaster调用assignManager的joinCluster方法 - rebuildUserRegions方法

获取不在线的region server包括他的region，以及zk上已经注册的region（hmaster宕掉的情况）

Map<ServerName, List<Pair<HRegionInfo, Result>>> rebuildUserRegions() throws IOException,
      KeeperException {
    // Region assignment from META
    List<Result> results = MetaReader.fullScan(this.catalogTracker);
    // Get any new but slow to checkin region server that joined the cluster
    Set<ServerName> onlineServers = serverManager.getOnlineServers().keySet();    
    // Map of offline servers and their regions to be returned
    Map<ServerName, List<Pair<HRegionInfo,Result>>> offlineServers =
      new TreeMap<ServerName, List<Pair<HRegionInfo, Result>>>();
    // Iterate regions in META
    for (Result result : results) {
      boolean disabled = false;
      boolean disablingOrEnabling = false;
      Pair<HRegionInfo, ServerName> region = MetaReader.parseCatalogResult(result);
      if (region == null) continue;
      HRegionInfo regionInfo = region.getFirst();
      ServerName regionLocation = region.getSecond();
      if (regionInfo == null) continue;
      String tableName = regionInfo.getTableNameAsString();
      if (regionLocation == null) {
        // regionLocation could be null if createTable didn't finish properly.
        // When createTable is in progress, HMaster restarts.
        // Some regions have been added to .META., but have not been assigned.
        // When this happens, the region's table must be in ENABLING state.
        // It can't be in ENABLED state as that is set when all regions are
        // assigned.
        // It can't be in DISABLING state, because DISABLING state transitions
        // from ENABLED state when application calls disableTable.
        // It can't be in DISABLED state, because DISABLED states transitions
        // from DISABLING state.
        if (false == checkIfRegionsBelongsToEnabling(regionInfo)) {
          LOG.warn("Region " + regionInfo.getEncodedName() +
            " has null regionLocation." + " But its table " + tableName +
            " isn't in ENABLING state.");
        }
        addTheTablesInPartialState(this.disablingTables, this.enablingTables, regionInfo,
            tableName);
      } else if (!onlineServers.contains(regionLocation)) {
        // Region is located on a server that isn't online
        List<Pair<HRegionInfo, Result>> offlineRegions =
          offlineServers.get(regionLocation);
        if (offlineRegions == null) {
          offlineRegions = new ArrayList<Pair<HRegionInfo,Result>>(1);
          offlineServers.put(regionLocation, offlineRegions);
        }
        offlineRegions.add(new Pair<HRegionInfo,Result>(regionInfo, result));
        disabled = checkIfRegionBelongsToDisabled(regionInfo);
        disablingOrEnabling = addTheTablesInPartialState(this.disablingTables,
            this.enablingTables, regionInfo, tableName);
        // need to enable the table if not disabled or disabling or enabling
        // this will be used in rolling restarts
        enableTableIfNotDisabledOrDisablingOrEnabling(disabled,
            disablingOrEnabling, tableName);
      } else {
        // If region is in offline and split state check the ZKNode
        if (regionInfo.isOffline() && regionInfo.isSplit()) {
          String node = ZKAssign.getNodeName(this.watcher, regionInfo
              .getEncodedName());
          Stat stat = new Stat();
          byte[] data = ZKUtil.getDataNoWatch(this.watcher, node, stat);
          // If znode does not exist dont consider this region
          if (data == null) {
            LOG.debug("Region "+ regionInfo.getRegionNameAsString() + " split is completed. " 
                + "Hence need not add to regions list");
            continue;
          }
        }
        // Region is being served and on an active server
        // add only if region not in disabled and enabling table
        if (false == checkIfRegionBelongsToDisabled(regionInfo)
            && false == checkIfRegionsBelongsToEnabling(regionInfo)) {
          synchronized (this.regions) {
            regions.put(regionInfo, regionLocation);
            addToServers(regionLocation, regionInfo);
          }
        }
        disablingOrEnabling = addTheTablesInPartialState(this.disablingTables,
            this.enablingTables, regionInfo, tableName);
        disabled = checkIfRegionBelongsToDisabled(regionInfo);
        // need to enable the table if not disabled or disabling or enabling
        // this will be used in rolling restarts
        enableTableIfNotDisabledOrDisablingOrEnabling(disabled,
            disablingOrEnabling, tableName);
      }
    }
    return offlineServers;
  }

7)分配region

assignmanager调用processDeadServersAndRegionsInTransition方法

分配region分为两个情况，一种是只需要分配的region不为0，则为mater宕机情况A，反之为正常情况B

A:processDeadServersAndRecoverLostRegions();

B：cleanoutUnassigned();

assignAllUserRegions();

分支A：方法processDeadServersAndRecoverLostRegions

遍历所有所有不在线上的region server，以及他的region

1.1判断region是否在zk上，若不在则表示region已经被一起在线的region server所处理

1.2若存在，则继续判断当前region是否需要分配

1.2.1region是disabled table的，则不需要处理

1.2.2region是splited region，则须要处理子region，若他的两个子region丢失，则在meta表上注册子region

1.3若需要分配region，调用方法createOrForceNodeOffline，给region设置watch

private void processDeadServersAndRecoverLostRegions(
      Map<ServerName, List<Pair<HRegionInfo, Result>>> deadServers,
      List<String> nodes) throws IOException, KeeperException {
    if (null != deadServers) {
      Set<ServerName> actualDeadServers = this.serverManager.getDeadServers();
      for (Map.Entry<ServerName, List<Pair<HRegionInfo, Result>>> deadServer : 
        deadServers.entrySet()) {
        // skip regions of dead servers because SSH will process regions during rs expiration.
        // see HBASE-5916
        if (actualDeadServers.contains(deadServer.getKey())) {
          for (Pair<HRegionInfo, Result> deadRegion : deadServer.getValue()) {
            nodes.remove(deadRegion.getFirst().getEncodedName());
          }
          continue;
        }
        List<Pair<HRegionInfo, Result>> regions = deadServer.getValue();
        for (Pair<HRegionInfo, Result> region : regions) {
          HRegionInfo regionInfo = region.getFirst();
          Result result = region.getSecond();
          // If region was in transition (was in zk) force it offline for
          // reassign
          try {
            RegionTransitionData data = ZKAssign.getData(watcher,
                regionInfo.getEncodedName());

            // If zk node of this region has been updated by a live server,
            // we consider that this region is being handled.
            // So we should skip it and process it in
            // processRegionsInTransition.
            if (data != null && data.getOrigin() != null && 
                serverManager.isServerOnline(data.getOrigin())) {
              LOG.info("The region " + regionInfo.getEncodedName()
                  + "is being handled on " + data.getOrigin());
              continue;
            }
            // Process with existing RS shutdown code
            boolean assign = ServerShutdownHandler.processDeadRegion(
                regionInfo, result, this, this.catalogTracker);
            if (assign) {
              ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
                  master.getServerName());
              if (!nodes.contains(regionInfo.getEncodedName())) {
                nodes.add(regionInfo.getEncodedName());
              }
            }
          } catch (KeeperException.NoNodeException nne) {
            // This is fine
          }
        }
      }
    }

之后对需要分配的region，进入RIT工作流，调用方法processRegionsInTransition

RIT流程，贴一个RIT流程分析的帖子http://blog.csdn.net/shenxiaoming77/article/details/18360199

void processRegionsInTransition(final RegionTransitionData data,
      final HRegionInfo regionInfo,
      final Map<ServerName, List<Pair<HRegionInfo, Result>>> deadServers,
      int expectedVersion)
  throws KeeperException {
    String encodedRegionName = regionInfo.getEncodedName();
    LOG.info("Processing region " + regionInfo.getRegionNameAsString() +
      " in state " + data.getEventType());
    synchronized (regionsInTransition) {
      RegionState regionState = regionsInTransition.get(encodedRegionName);
      if (regionState != null ||
          failoverProcessedRegions.containsKey(encodedRegionName)) {
        // Just return
        return;
      }
      switch (data.getEventType()) {
      case M_ZK_REGION_CLOSING:
        // If zk node of the region was updated by a live server skip this
        // region and just add it into RIT.
        if (isOnDeadServer(regionInfo, deadServers) &&
            (data.getOrigin() == null || !serverManager.isServerOnline(data.getOrigin()))) {
          // If was on dead server, its closed now. Force to OFFLINE and this
          // will get it reassigned if appropriate
          forceOffline(regionInfo, data);
        } else {
          // Just insert region into RIT.
          // If this never updates the timeout will trigger new assignment
          regionsInTransition.put(encodedRegionName, new RegionState(
            regionInfo, RegionState.State.CLOSING,
            data.getStamp(), data.getOrigin()));
        }
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;

      case RS_ZK_REGION_CLOSED:
      case RS_ZK_REGION_FAILED_OPEN:
        // Region is closed, insert into RIT and handle it
        addToRITandCallClose(regionInfo, RegionState.State.CLOSED, data);
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;

      case M_ZK_REGION_OFFLINE:
        // If zk node of the region was updated by a live server skip this
        // region and just add it into RIT.
        if (isOnDeadServer(regionInfo, deadServers) &&
            (data.getOrigin() == null ||
              !serverManager.isServerOnline(data.getOrigin()))) {
          // Region is offline, insert into RIT and handle it like a closed
          addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, data);
        } else if (data.getOrigin() != null &&
            !serverManager.isServerOnline(data.getOrigin())) {
          // to handle cases where offline node is created but sendRegionOpen
          // RPC is not yet sent
          addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, data);
        } else {
          regionsInTransition.put(encodedRegionName, new RegionState(
              regionInfo, RegionState.State.PENDING_OPEN, data.getStamp(), data
                  .getOrigin()));
        }
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;

      case RS_ZK_REGION_OPENING:
        // TODO: Could check if it was on deadServers.  If it was, then we could
        // do what happens in TimeoutMonitor when it sees this condition.

        // Just insert region into RIT
        // If this never updates the timeout will trigger new assignment
        if (regionInfo.isMetaTable()) {
          regionsInTransition.put(encodedRegionName, new RegionState(
              regionInfo, RegionState.State.OPENING, data.getStamp(), data
                  .getOrigin()));
          // If ROOT or .META. table is waiting for timeout monitor to assign
          // it may take lot of time when the assignment.timeout.period is
          // the default value which may be very long.  We will not be able
          // to serve any request during this time.
          // So we will assign the ROOT and .META. region immediately.
          processOpeningState(regionInfo);
          break;
        }
        regionsInTransition.put(encodedRegionName, new RegionState(regionInfo,
            RegionState.State.OPENING, data.getStamp(), data.getOrigin()));
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;

      case RS_ZK_REGION_OPENED:
        // Region is opened, insert into RIT and handle it
        regionsInTransition.put(encodedRegionName, new RegionState(
            regionInfo, RegionState.State.OPEN,
            data.getStamp(), data.getOrigin()));
        ServerName sn = data.getOrigin() == null? null: data.getOrigin();
        // sn could be null if this server is no longer online.  If
        // that is the case, just let this RIT timeout; it'll be assigned
        // to new server then.
        if (sn == null) {
          LOG.warn("Region in transition " + regionInfo.getEncodedName() +
            " references a null server; letting RIT timeout so will be " +
            "assigned elsewhere");
        } else if (!serverManager.isServerOnline(sn)
            && (isOnDeadServer(regionInfo, deadServers)
                || regionInfo.isMetaRegion() || regionInfo.isRootRegion())) {
          forceOffline(regionInfo, data);
        } else {
          new OpenedRegionHandler(master, this, regionInfo, sn, expectedVersion)
              .process();
        }
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;
      }
    }
  }

B分支：1.调用方法cleanoutUnassigned，清除zk所有node，重新watch

2.调用assignAllUserRegions

1.获取所有region

2.分配region，当hbase.master.startup.retainassign为true，按照meta表信息分配region，反之则online region server中随机选择

8)检查子region

void fixupDaughters(final MonitoredTask status) throws IOException {
    final Map<HRegionInfo, Result> offlineSplitParents =
      new HashMap<HRegionInfo, Result>();
    // This visitor collects offline split parents in the .META. table
    MetaReader.Visitor visitor = new MetaReader.Visitor() {
      @Override
      public boolean visit(Result r) throws IOException {
        if (r == null || r.isEmpty()) return true;
        HRegionInfo info =
          MetaReader.parseHRegionInfoFromCatalogResult(
            r, HConstants.REGIONINFO_QUALIFIER);
        if (info == null) return true; // Keep scanning
        if (info.isOffline() && info.isSplit()) {
          offlineSplitParents.put(info, r);
        }
        // Returning true means "keep scanning"
        return true;
      }
    };
    // Run full scan of .META. catalog table passing in our custom visitor
    MetaReader.fullScan(this.catalogTracker, visitor);
    // Now work on our list of found parents. See if any we can clean up.
    int fixups = 0;
    for (Map.Entry<HRegionInfo, Result> e : offlineSplitParents.entrySet()) {
      fixups += ServerShutdownHandler.fixupDaughters(
          e.getValue(), assignmentManager, catalogTracker);
    }
    if (fixups != 0) {
      LOG.info("Scanned the catalog and fixed up " + fixups +
        " missing daughter region(s)");
    }
  }

9)启动balance线程

HMaster balance策略为region server拥有的最小region为 regions/servers，最大为regions/servers+1

先找到超过max的region server，获得需要分配的region；以及获得小于min的region server

再将需要分配的region分给小于min的region server直到max或分配结束

 @Override
  public boolean balance() {
    // if master not initialized, don't run balancer.
    if (!this.initialized) {
      LOG.debug("Master has not been initialized, don't run balancer.");
      return false;
    }
    // If balance not true, don't run balancer.
    if (!this.balanceSwitch) return false;
    // Do this call outside of synchronized block.
    int maximumBalanceTime = getBalancerCutoffTime();
    long cutoffTime = System.currentTimeMillis() + maximumBalanceTime;
    boolean balancerRan;
    synchronized (this.balancer) {
      // Only allow one balance run at at time.
      if (this.assignmentManager.isRegionsInTransition()) {
        LOG.debug("Not running balancer because " +
          this.assignmentManager.getRegionsInTransition().size() +
          " region(s) in transition: " +
          org.apache.commons.lang.StringUtils.
            abbreviate(this.assignmentManager.getRegionsInTransition().toString(), 256));
        return false;
      }
      if (this.serverManager.areDeadServersInProgress()) {
        LOG.debug("Not running balancer because processing dead regionserver(s): " +
          this.serverManager.getDeadServers());
        return false;
      }

      if (this.cpHost != null) {
        try {
          if (this.cpHost.preBalance()) {
            LOG.debug("Coprocessor bypassing balancer request");
            return false;
          }
        } catch (IOException ioe) {
          LOG.error("Error invoking master coprocessor preBalance()", ioe);
          return false;
        }
      }

      Map<String, Map<ServerName, List<HRegionInfo>>> assignmentsByTable =
        this.assignmentManager.getAssignmentsByTable();

      List<RegionPlan> plans = new ArrayList<RegionPlan>();
      for (Map<ServerName, List<HRegionInfo>> assignments : assignmentsByTable.values()) {
        List<RegionPlan> partialPlans = this.balancer.balanceCluster(assignments);
        if (partialPlans != null) plans.addAll(partialPlans);
      }
      int rpCount = 0;  // number of RegionPlans balanced so far
      long totalRegPlanExecTime = 0;
      balancerRan = plans != null;
      if (plans != null && !plans.isEmpty()) {
        for (RegionPlan plan: plans) {
          LOG.info("balance " + plan);
          long balStartTime = System.currentTimeMillis();
          this.assignmentManager.balance(plan);
          totalRegPlanExecTime += System.currentTimeMillis()-balStartTime;
          rpCount++;
          if (rpCount < plans.size() &&
              // if performing next balance exceeds cutoff time, exit the loop
              (System.currentTimeMillis() + (totalRegPlanExecTime / rpCount)) > cutoffTime) {
            LOG.debug("No more balancing till next balance run; maximumBalanceTime=" +
              maximumBalanceTime);
            break;
          }
        }
      }
      if (this.cpHost != null) {
        try {
          this.cpHost.postBalance();
        } catch (IOException ioe) {
          // balancing already succeeded so don't change the result
          LOG.error("Error invoking master coprocessor postBalance()", ioe);
        }
      }
    }
    return balancerRan;
  }

终于写完了，也感谢能看到这里的朋友，因为是回家重写的，有点糙，对不起了

分享到：

java worker thread模式 | hbase rpc

2014-03-21 21:37
浏览 4136
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HMaster启动源码分析

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HMaster启动源码分析

评论

发表评论

相关推荐

hbase hbck流程

ERROR: Found lingering reference file hdfs

hbase Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 1

Java线上应用故障排查之一：高CPU占用

hbase报错 java.io.IOException: Connection reset by peer

hive整合hbase

hbase increment代码

hbase问题

hbase export import table

HBase MSLAB和MemStoreChunkPool源码

hbase split log转cloudera的文章

IllegalAccessError HBaseZeroCopyByteString

hbase hlog源码

hbase mvcc

hbase split log源码分析

hbase0.98.1源码编译

hbase put源码分析

HBase RegionServer线程启动

hadoop和hbase lzo压缩

hbase blockcache BucketCache源码分析

最近访客更多访客>>