`

HMaster启动源码分析

阅读更多

 

   写之前先吐槽一下自己的sb公司环境,电脑上不了网,优盘又不能插。所以做点笔记基本上都是晚上回家再写一遍。哎,废话不说了

 

   先贴个hbase在构造函数中起来的RPC服务的UML图 :http://blackproof.iteye.com/blog/2029170

   

  HMaster启动会调用Run方法,概述为

  HBase启动

 

hmaster处理备份master
进入HMaster的启动过程:方法:
  1.Zk监控类
  2.创建线程池,并启动60010jetty服务
  3.Regionserver启动
  4.HLog处理流程
  5.HMaster注册root和meta表
  6.准备步骤,收集offline server including region和zk assigned region
  7.分配region
  8.检查子region
  9.启动balance线程
 

 

 

详细分析:

         调用方法becomeActiveMaster,先检查配置项"hbase.master.backup",自己是否backup机器,如果是则直接block直至检查到系统中的active master挂掉(默认每3分钟检查一次) 

  private boolean becomeActiveMaster(MonitoredTask startupStatus)
  throws InterruptedException {
    // TODO: This is wrong!!!! Should have new servername if we restart ourselves,
    // if we come back to life.
    this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,
        this);
    this.zooKeeper.registerListener(activeMasterManager);
    stallIfBackupMaster(this.conf, this.activeMasterManager);

    // The ClusterStatusTracker is setup before the other
    // ZKBasedSystemTrackers because it's needed by the activeMasterManager
    // to check if the cluster should be shutdown.
    this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
    this.clusterStatusTracker.start();
    return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus);
  }

 

1)         Zk监控类

首先初始化所有zk监控类,调用方法initializeZKBasedSystemTrackers

初始化zk listener

1.createCatalogTracker(rootRegionTracker,metaRegionTracker)是root和meta的监控类

2. AssignmentManager 管理分配region

3. RegionServerManager 监控regionserver,服务已死的server

4.DrainingServerTracker 服务维护draining RS list

void initializeZKBasedSystemTrackers() throws IOException,
      InterruptedException, KeeperException {
    this.catalogTracker = createCatalogTracker(this.zooKeeper, this.conf, this);
    this.catalogTracker.start();

    this.balancer = LoadBalancerFactory.getLoadBalancer(conf);
    this.loadBalancerTracker = new LoadBalancerTracker(zooKeeper, this);
    this.loadBalancerTracker.start();
    this.assignmentManager = new AssignmentManager(this, serverManager,
      this.catalogTracker, this.balancer, this.executorService, this.metricsMaster,
      this.tableLockManager);
    zooKeeper.registerListenerFirst(assignmentManager);

    this.regionServerTracker = new RegionServerTracker(zooKeeper, this,
        this.serverManager);
    this.regionServerTracker.start();

    this.drainingServerTracker = new DrainingServerTracker(zooKeeper, this,
      this.serverManager);
    this.drainingServerTracker.start();

    // Set the cluster as up.  If new RSs, they'll be waiting on this before
    // going ahead with their startup.
    boolean wasUp = this.clusterStatusTracker.isClusterUp();
    if (!wasUp) this.clusterStatusTracker.setClusterUp();

    LOG.info("Server active/primary master=" + this.serverName +
        ", sessionid=0x" +
        Long.toHexString(this.zooKeeper.getRecoverableZooKeeper().getSessionId()) +
        ", setting cluster-up flag (Was=" + wasUp + ")");

    // create the snapshot manager
    this.snapshotManager = new SnapshotManager(this, this.metricsMaster);
  }

 

 2)创建线程池,并启动60010jetty服务

   1.启动线程池,和以下线程

        MASTER_META_SERVER_OPERATIONS 

        MASTER_SERVER_OPERATIONS 

        MASTER_CLOSE_REGION 

        MASTER_OPEN_REGION 

        MASTER_TABLE_OPERATIONS

     2.LogCleaner线程 清除.oldlog目录

      3.60010 jetty 服务

 

 void startServiceThreads() throws IOException{
   // Start the executor service pools
   this.executorService.startExecutorService(ExecutorType.MASTER_OPEN_REGION,
      conf.getInt("hbase.master.executor.openregion.threads", 5));
   this.executorService.startExecutorService(ExecutorType.MASTER_CLOSE_REGION,
      conf.getInt("hbase.master.executor.closeregion.threads", 5));
   this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,
      conf.getInt("hbase.master.executor.serverops.threads", 5));
   this.executorService.startExecutorService(ExecutorType.MASTER_META_SERVER_OPERATIONS,
      conf.getInt("hbase.master.executor.serverops.threads", 5));
   this.executorService.startExecutorService(ExecutorType.M_LOG_REPLAY_OPS,
      conf.getInt("hbase.master.executor.logreplayops.threads", 10));

   // We depend on there being only one instance of this executor running
   // at a time.  To do concurrency, would need fencing of enable/disable of
   // tables.
   this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS, 1);

   // Start log cleaner thread
   String n = Thread.currentThread().getName();
   int cleanerInterval = conf.getInt("hbase.master.cleaner.interval", 60 * 1000);
   this.logCleaner =
      new LogCleaner(cleanerInterval,
         this, conf, getMasterFileSystem().getFileSystem(),
         getMasterFileSystem().getOldLogDir());
         Threads.setDaemonThreadRunning(logCleaner.getThread(), n + ".oldLogCleaner");

   //start the hfile archive cleaner thread
    Path archiveDir = HFileArchiveUtil.getArchivePath(conf);
    this.hfileCleaner = new HFileCleaner(cleanerInterval, this, conf, getMasterFileSystem()
        .getFileSystem(), archiveDir);
    Threads.setDaemonThreadRunning(hfileCleaner.getThread(), n + ".archivedHFileCleaner");

    // Start the health checker
    if (this.healthCheckChore != null) {
      Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), n + ".healthChecker");
    }

    // Start allowing requests to happen.
    this.rpcServer.openServer();
    this.rpcServerOpen = true;
    if (LOG.isTraceEnabled()) {
      LOG.trace("Started service threads");
    }
  }
 

 

 3)Regionserver启动

   serverManager调用方法:waitForRegionServers

   等待regionserver启动,在满足一下条件后继续启动hmaster:

       a 至少等待4.5s("hbase.master.wait.on.regionservers.timeout") 

       b 成功启动regionserver节点数>=1("hbase.master.wait.on.regionservers.mintostart") 

       c 1.5s内没有regionsever死掉或新启动("hbase.master.wait.on.regionservers.interval")

 

public void waitForRegionServers(MonitoredTask status)
  throws InterruptedException {
    final long interval = this.master.getConfiguration().
      getLong(WAIT_ON_REGIONSERVERS_INTERVAL, 1500);
    final long timeout = this.master.getConfiguration().
      getLong(WAIT_ON_REGIONSERVERS_TIMEOUT, 4500);
    int minToStart = this.master.getConfiguration().
      getInt(WAIT_ON_REGIONSERVERS_MINTOSTART, 1);
    if (minToStart < 1) {
      LOG.warn(String.format(
        "The value of '%s' (%d) can not be less than 1, ignoring.",
        WAIT_ON_REGIONSERVERS_MINTOSTART, minToStart));
      minToStart = 1;
    }
    int maxToStart = this.master.getConfiguration().
      getInt(WAIT_ON_REGIONSERVERS_MAXTOSTART, Integer.MAX_VALUE);
    if (maxToStart < minToStart) {
        LOG.warn(String.format(
            "The value of '%s' (%d) is set less than '%s' (%d), ignoring.",
            WAIT_ON_REGIONSERVERS_MAXTOSTART, maxToStart,
            WAIT_ON_REGIONSERVERS_MINTOSTART, minToStart));
        maxToStart = Integer.MAX_VALUE;
    }

    long now =  System.currentTimeMillis();
    final long startTime = now;
    long slept = 0;
    long lastLogTime = 0;
    long lastCountChange = startTime;
    int count = countOfRegionServers();
    int oldCount = 0;
    while (
      !this.master.isStopped() &&
        count < maxToStart &&
        (lastCountChange+interval > now || timeout > slept || count < minToStart)
      ){

      // Log some info at every interval time or if there is a change
      if (oldCount != count || lastLogTime+interval < now){
        lastLogTime = now;
        String msg =
          "Waiting for region servers count to settle; currently"+
            " checked in " + count + ", slept for " + slept + " ms," +
            " expecting minimum of " + minToStart + ", maximum of "+ maxToStart+
            ", timeout of "+timeout+" ms, interval of "+interval+" ms.";
        LOG.info(msg);
        status.setStatus(msg);
      }

      // We sleep for some time
      final long sleepTime = 50;
      Thread.sleep(sleepTime);
      now =  System.currentTimeMillis();
      slept = now - startTime;

      oldCount = count;
      count = countOfRegionServers();
      if (count != oldCount) {
        lastCountChange = now;
      }
    }

    LOG.info("Finished waiting for region servers count to settle;" +
      " checked in " + count + ", slept for " + slept + " ms," +
      " expecting minimum of " + minToStart + ", maximum of "+ maxToStart+","+
      " master is "+ (this.master.isStopped() ? "stopped.": "running.")
    );
  }
 
4)HLog处理流程(贴出代码是处理流程部分代码)

 处理目的是为了找到没有server归属的HLog,并将其付给其他的region server

masterfilesystem 调用 splitLogAfterStartup方法

1.split加锁

2.获得所有HLog与online region server做碰撞,没有碰上的Hlog进入splitLog方法

3.创建HLogSplitter,等待saftmode,调用HlogSplitter.splitLog,

4.HLogSplitter的reader读取hlog到内存中EntryBuffers,读取完毕移动log,有问题的到.corrupt目录中,处理完的放在.oldlogs中

4.HLogSplitter的 OutputSink 创建多个writer线程,读取entryBuffers写到region下的recovered.edits下的文件夹

5.解锁

 

this.splitLogLock.lock();
        try {              
          HLogSplitter splitter = HLogSplitter.createLogSplitter(
            conf, rootdir, logDir, oldLogDir, this.fs);
          try {
            // If FS is in safe mode, just wait till out of it.
            FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));
            splitter.splitLog();
          } catch (OrphanHLogAfterSplitException e) {
            LOG.warn("Retrying splitting because of:", e);
            //An HLogSplitter instance can only be used once.  Get new instance.
            splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
              oldLogDir, this.fs);
            splitter.splitLog();
          }
          splitTime = splitter.getTime();
          splitLogSize = splitter.getSize();
        } finally {
          this.splitLogLock.unlock();
        }
   
private List<Path> splitLog(final FileStatus[] logfiles) throws IOException {
    List<Path> processedLogs = new ArrayList<Path>();
    List<Path> corruptedLogs = new ArrayList<Path>();
    List<Path> splits = null;

    boolean skipErrors = conf.getBoolean("hbase.hlog.split.skip.errors", true);

    countTotalBytes(logfiles);
    splitSize = 0;

    outputSink.startWriterThreads(entryBuffers);

    try {
      int i = 0;
      for (FileStatus log : logfiles) {
       Path logPath = log.getPath();
        long logLength = log.getLen();
        splitSize += logLength;
        logAndReport("Splitting hlog " + (i++ + 1) + " of " + logfiles.length
            + ": " + logPath + ", length=" + logLength);
        Reader in;
        try {
          in = getReader(fs, log, conf, skipErrors);
          if (in != null) {
            parseHLog(in, logPath, entryBuffers, fs, conf, skipErrors);
            try {
              in.close();
            } catch (IOException e) {
              LOG.warn("Close log reader threw exception -- continuing",
                  e);
            }
          }
          processedLogs.add(logPath);
        } catch (CorruptedLogFileException e) {
          LOG.info("Got while parsing hlog " + logPath +
              ". Marking as corrupted", e);
          corruptedLogs.add(logPath);
          continue;
        }
      }
      status.setStatus("Log splits complete. Checking for orphaned logs.");
      
      if (fs.listStatus(srcDir).length > processedLogs.size()
          + corruptedLogs.size()) {
        throw new OrphanHLogAfterSplitException(
            "Discovered orphan hlog after split. Maybe the "
            + "HRegionServer was not dead when we started");
      }
    } finally {
      status.setStatus("Finishing writing output logs and closing down.");
      splits = outputSink.finishWritingAndClose();
    }
    status.setStatus("Archiving logs after completed split");
    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
    return splits;
  }
 

 

 5)HMaster注册root和meta表

 

int assignRootAndMeta(MonitoredTask status)
  throws InterruptedException, IOException, KeeperException {
    int assigned = 0;
    long timeout = this.conf.getLong("hbase.catalog.verification.timeout", 1000);

    // Work on ROOT region.  Is it in zk in transition?
    status.setStatus("Assigning ROOT region");
    boolean rit = this.assignmentManager.
      processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
    ServerName currentRootServer = null;
    boolean rootRegionLocation = catalogTracker.verifyRootRegionLocation(timeout);
    if (!rit && !rootRegionLocation) {
      currentRootServer = this.catalogTracker.getRootLocation();
      splitLogAndExpireIfOnline(currentRootServer);
      this.assignmentManager.assignRoot();
      waitForRootAssignment();
      assigned++;
    } else if (rit && !rootRegionLocation) {
      waitForRootAssignment();
      assigned++;
    } else {
      // Region already assigned. We didn't assign it. Add to in-memory state.
      this.assignmentManager.regionOnline(HRegionInfo.ROOT_REGIONINFO,
          this.catalogTracker.getRootLocation());
    }
    // Enable the ROOT table if on process fail over the RS containing ROOT
    // was active.
    enableCatalogTables(Bytes.toString(HConstants.ROOT_TABLE_NAME));
    LOG.info("-ROOT- assigned=" + assigned + ", rit=" + rit +
      ", location=" + catalogTracker.getRootLocation());

    // Work on meta region
    status.setStatus("Assigning META region");
    rit = this.assignmentManager.
      processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO);
    boolean metaRegionLocation = this.catalogTracker.verifyMetaRegionLocation(timeout);
    if (!rit && !metaRegionLocation) {
      ServerName currentMetaServer =
        this.catalogTracker.getMetaLocationOrReadLocationFromRoot();
      if (currentMetaServer != null
          && !currentMetaServer.equals(currentRootServer)) {
        splitLogAndExpireIfOnline(currentMetaServer);
      }
      assignmentManager.assignMeta();
      enableSSHandWaitForMeta();
      assigned++;
    } else if (rit && !metaRegionLocation) {
      enableSSHandWaitForMeta();
      assigned++;
    } else {
      // Region already assigned.  We didnt' assign it.  Add to in-memory state.
      this.assignmentManager.regionOnline(HRegionInfo.FIRST_META_REGIONINFO,
        this.catalogTracker.getMetaLocation());
    }
    enableCatalogTables(Bytes.toString(HConstants.META_TABLE_NAME));
    LOG.info(".META. assigned=" + assigned + ", rit=" + rit +
      ", location=" + catalogTracker.getMetaLocation());
    status.setStatus("META and ROOT assigned.");
    return assigned;
  }
 

 

6)准备步骤,收集offline server including region和zk assigned region

hmaster调用assignManager的joinCluster方法 - rebuildUserRegions方法

获取不在线的region server包括他的region,以及zk上已经注册的region(hmaster宕掉的情况)

 

Map<ServerName, List<Pair<HRegionInfo, Result>>> rebuildUserRegions() throws IOException,
      KeeperException {
    // Region assignment from META
    List<Result> results = MetaReader.fullScan(this.catalogTracker);
    // Get any new but slow to checkin region server that joined the cluster
    Set<ServerName> onlineServers = serverManager.getOnlineServers().keySet();    
    // Map of offline servers and their regions to be returned
    Map<ServerName, List<Pair<HRegionInfo,Result>>> offlineServers =
      new TreeMap<ServerName, List<Pair<HRegionInfo, Result>>>();
    // Iterate regions in META
    for (Result result : results) {
      boolean disabled = false;
      boolean disablingOrEnabling = false;
      Pair<HRegionInfo, ServerName> region = MetaReader.parseCatalogResult(result);
      if (region == null) continue;
      HRegionInfo regionInfo = region.getFirst();
      ServerName regionLocation = region.getSecond();
      if (regionInfo == null) continue;
      String tableName = regionInfo.getTableNameAsString();
      if (regionLocation == null) {
        // regionLocation could be null if createTable didn't finish properly.
        // When createTable is in progress, HMaster restarts.
        // Some regions have been added to .META., but have not been assigned.
        // When this happens, the region's table must be in ENABLING state.
        // It can't be in ENABLED state as that is set when all regions are
        // assigned.
        // It can't be in DISABLING state, because DISABLING state transitions
        // from ENABLED state when application calls disableTable.
        // It can't be in DISABLED state, because DISABLED states transitions
        // from DISABLING state.
        if (false == checkIfRegionsBelongsToEnabling(regionInfo)) {
          LOG.warn("Region " + regionInfo.getEncodedName() +
            " has null regionLocation." + " But its table " + tableName +
            " isn't in ENABLING state.");
        }
        addTheTablesInPartialState(this.disablingTables, this.enablingTables, regionInfo,
            tableName);
      } else if (!onlineServers.contains(regionLocation)) {
        // Region is located on a server that isn't online
        List<Pair<HRegionInfo, Result>> offlineRegions =
          offlineServers.get(regionLocation);
        if (offlineRegions == null) {
          offlineRegions = new ArrayList<Pair<HRegionInfo,Result>>(1);
          offlineServers.put(regionLocation, offlineRegions);
        }
        offlineRegions.add(new Pair<HRegionInfo,Result>(regionInfo, result));
        disabled = checkIfRegionBelongsToDisabled(regionInfo);
        disablingOrEnabling = addTheTablesInPartialState(this.disablingTables,
            this.enablingTables, regionInfo, tableName);
        // need to enable the table if not disabled or disabling or enabling
        // this will be used in rolling restarts
        enableTableIfNotDisabledOrDisablingOrEnabling(disabled,
            disablingOrEnabling, tableName);
      } else {
        // If region is in offline and split state check the ZKNode
        if (regionInfo.isOffline() && regionInfo.isSplit()) {
          String node = ZKAssign.getNodeName(this.watcher, regionInfo
              .getEncodedName());
          Stat stat = new Stat();
          byte[] data = ZKUtil.getDataNoWatch(this.watcher, node, stat);
          // If znode does not exist dont consider this region
          if (data == null) {
            LOG.debug("Region "+ regionInfo.getRegionNameAsString() + " split is completed. " 
                + "Hence need not add to regions list");
            continue;
          }
        }
        // Region is being served and on an active server
        // add only if region not in disabled and enabling table
        if (false == checkIfRegionBelongsToDisabled(regionInfo)
            && false == checkIfRegionsBelongsToEnabling(regionInfo)) {
          synchronized (this.regions) {
            regions.put(regionInfo, regionLocation);
            addToServers(regionLocation, regionInfo);
          }
        }
        disablingOrEnabling = addTheTablesInPartialState(this.disablingTables,
            this.enablingTables, regionInfo, tableName);
        disabled = checkIfRegionBelongsToDisabled(regionInfo);
        // need to enable the table if not disabled or disabling or enabling
        // this will be used in rolling restarts
        enableTableIfNotDisabledOrDisablingOrEnabling(disabled,
            disablingOrEnabling, tableName);
      }
    }
    return offlineServers;
  }
 

 

7)分配region

assignmanager调用processDeadServersAndRegionsInTransition方法

分配region分为两个情况,一种是只需要分配的region不为0,则为mater宕机情况A,反之为正常情况B

   A:processDeadServersAndRecoverLostRegions();

   B:cleanoutUnassigned();

      assignAllUserRegions();

 

 分支A:方法processDeadServersAndRecoverLostRegions

遍历所有所有不在线上的region server,以及他的region

1.1判断region是否在zk上,若不在则表示region已经被一起在线的region server所处理

1.2若存在,则继续判断当前region是否需要分配

  1.2.1region是disabled table的,则不需要处理

  1.2.2region是splited region,则须要处理子region,若他的两个子region丢失,则在meta表上注册子region

1.3若需要分配region,调用方法createOrForceNodeOffline,给region设置watch

private void processDeadServersAndRecoverLostRegions(
      Map<ServerName, List<Pair<HRegionInfo, Result>>> deadServers,
      List<String> nodes) throws IOException, KeeperException {
    if (null != deadServers) {
      Set<ServerName> actualDeadServers = this.serverManager.getDeadServers();
      for (Map.Entry<ServerName, List<Pair<HRegionInfo, Result>>> deadServer : 
        deadServers.entrySet()) {
        // skip regions of dead servers because SSH will process regions during rs expiration.
        // see HBASE-5916
        if (actualDeadServers.contains(deadServer.getKey())) {
          for (Pair<HRegionInfo, Result> deadRegion : deadServer.getValue()) {
            nodes.remove(deadRegion.getFirst().getEncodedName());
          }
          continue;
        }
        List<Pair<HRegionInfo, Result>> regions = deadServer.getValue();
        for (Pair<HRegionInfo, Result> region : regions) {
          HRegionInfo regionInfo = region.getFirst();
          Result result = region.getSecond();
          // If region was in transition (was in zk) force it offline for
          // reassign
          try {
            RegionTransitionData data = ZKAssign.getData(watcher,
                regionInfo.getEncodedName());

            // If zk node of this region has been updated by a live server,
            // we consider that this region is being handled.
            // So we should skip it and process it in
            // processRegionsInTransition.
            if (data != null && data.getOrigin() != null && 
                serverManager.isServerOnline(data.getOrigin())) {
              LOG.info("The region " + regionInfo.getEncodedName()
                  + "is being handled on " + data.getOrigin());
              continue;
            }
            // Process with existing RS shutdown code
            boolean assign = ServerShutdownHandler.processDeadRegion(
                regionInfo, result, this, this.catalogTracker);
            if (assign) {
              ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
                  master.getServerName());
              if (!nodes.contains(regionInfo.getEncodedName())) {
                nodes.add(regionInfo.getEncodedName());
              }
            }
          } catch (KeeperException.NoNodeException nne) {
            // This is fine
          }
        }
      }
    }

 之后对需要分配的region,进入RIT工作流,调用方法processRegionsInTransition

RIT流程,贴一个RIT流程分析的帖子http://blog.csdn.net/shenxiaoming77/article/details/18360199

   

void processRegionsInTransition(final RegionTransitionData data,
      final HRegionInfo regionInfo,
      final Map<ServerName, List<Pair<HRegionInfo, Result>>> deadServers,
      int expectedVersion)
  throws KeeperException {
    String encodedRegionName = regionInfo.getEncodedName();
    LOG.info("Processing region " + regionInfo.getRegionNameAsString() +
      " in state " + data.getEventType());
    synchronized (regionsInTransition) {
      RegionState regionState = regionsInTransition.get(encodedRegionName);
      if (regionState != null ||
          failoverProcessedRegions.containsKey(encodedRegionName)) {
        // Just return
        return;
      }
      switch (data.getEventType()) {
      case M_ZK_REGION_CLOSING:
        // If zk node of the region was updated by a live server skip this
        // region and just add it into RIT.
        if (isOnDeadServer(regionInfo, deadServers) &&
            (data.getOrigin() == null || !serverManager.isServerOnline(data.getOrigin()))) {
          // If was on dead server, its closed now. Force to OFFLINE and this
          // will get it reassigned if appropriate
          forceOffline(regionInfo, data);
        } else {
          // Just insert region into RIT.
          // If this never updates the timeout will trigger new assignment
          regionsInTransition.put(encodedRegionName, new RegionState(
            regionInfo, RegionState.State.CLOSING,
            data.getStamp(), data.getOrigin()));
        }
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;

      case RS_ZK_REGION_CLOSED:
      case RS_ZK_REGION_FAILED_OPEN:
        // Region is closed, insert into RIT and handle it
        addToRITandCallClose(regionInfo, RegionState.State.CLOSED, data);
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;

      case M_ZK_REGION_OFFLINE:
        // If zk node of the region was updated by a live server skip this
        // region and just add it into RIT.
        if (isOnDeadServer(regionInfo, deadServers) &&
            (data.getOrigin() == null ||
              !serverManager.isServerOnline(data.getOrigin()))) {
          // Region is offline, insert into RIT and handle it like a closed
          addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, data);
        } else if (data.getOrigin() != null &&
            !serverManager.isServerOnline(data.getOrigin())) {
          // to handle cases where offline node is created but sendRegionOpen
          // RPC is not yet sent
          addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, data);
        } else {
          regionsInTransition.put(encodedRegionName, new RegionState(
              regionInfo, RegionState.State.PENDING_OPEN, data.getStamp(), data
                  .getOrigin()));
        }
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;

      case RS_ZK_REGION_OPENING:
        // TODO: Could check if it was on deadServers.  If it was, then we could
        // do what happens in TimeoutMonitor when it sees this condition.

        // Just insert region into RIT
        // If this never updates the timeout will trigger new assignment
        if (regionInfo.isMetaTable()) {
          regionsInTransition.put(encodedRegionName, new RegionState(
              regionInfo, RegionState.State.OPENING, data.getStamp(), data
                  .getOrigin()));
          // If ROOT or .META. table is waiting for timeout monitor to assign
          // it may take lot of time when the assignment.timeout.period is
          // the default value which may be very long.  We will not be able
          // to serve any request during this time.
          // So we will assign the ROOT and .META. region immediately.
          processOpeningState(regionInfo);
          break;
        }
        regionsInTransition.put(encodedRegionName, new RegionState(regionInfo,
            RegionState.State.OPENING, data.getStamp(), data.getOrigin()));
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;

      case RS_ZK_REGION_OPENED:
        // Region is opened, insert into RIT and handle it
        regionsInTransition.put(encodedRegionName, new RegionState(
            regionInfo, RegionState.State.OPEN,
            data.getStamp(), data.getOrigin()));
        ServerName sn = data.getOrigin() == null? null: data.getOrigin();
        // sn could be null if this server is no longer online.  If
        // that is the case, just let this RIT timeout; it'll be assigned
        // to new server then.
        if (sn == null) {
          LOG.warn("Region in transition " + regionInfo.getEncodedName() +
            " references a null server; letting RIT timeout so will be " +
            "assigned elsewhere");
        } else if (!serverManager.isServerOnline(sn)
            && (isOnDeadServer(regionInfo, deadServers)
                || regionInfo.isMetaRegion() || regionInfo.isRootRegion())) {
          forceOffline(regionInfo, data);
        } else {
          new OpenedRegionHandler(master, this, regionInfo, sn, expectedVersion)
              .process();
        }
        failoverProcessedRegions.put(encodedRegionName, regionInfo);
        break;
      }
    }
  }

B分支:1.调用方法cleanoutUnassigned,清除zk所有node,重新watch

             2.调用assignAllUserRegions

                   1.获取所有region

                   2.分配region,当hbase.master.startup.retainassign为true,按照meta表信息分配region,反之                       则online region server中随机选择  

8)检查子region

    

void fixupDaughters(final MonitoredTask status) throws IOException {
    final Map<HRegionInfo, Result> offlineSplitParents =
      new HashMap<HRegionInfo, Result>();
    // This visitor collects offline split parents in the .META. table
    MetaReader.Visitor visitor = new MetaReader.Visitor() {
      @Override
      public boolean visit(Result r) throws IOException {
        if (r == null || r.isEmpty()) return true;
        HRegionInfo info =
          MetaReader.parseHRegionInfoFromCatalogResult(
            r, HConstants.REGIONINFO_QUALIFIER);
        if (info == null) return true; // Keep scanning
        if (info.isOffline() && info.isSplit()) {
          offlineSplitParents.put(info, r);
        }
        // Returning true means "keep scanning"
        return true;
      }
    };
    // Run full scan of .META. catalog table passing in our custom visitor
    MetaReader.fullScan(this.catalogTracker, visitor);
    // Now work on our list of found parents. See if any we can clean up.
    int fixups = 0;
    for (Map.Entry<HRegionInfo, Result> e : offlineSplitParents.entrySet()) {
      fixups += ServerShutdownHandler.fixupDaughters(
          e.getValue(), assignmentManager, catalogTracker);
    }
    if (fixups != 0) {
      LOG.info("Scanned the catalog and fixed up " + fixups +
        " missing daughter region(s)");
    }
  }

 

9)启动balance线程

    HMaster balance策略为region server拥有的最小region为 regions/servers,最大为regions/servers+1

    先找到超过max的region server,获得需要分配的region;以及获得小于min的region server

    再将需要分配的region分给小于min的region server直到max或分配结束

    

 @Override
  public boolean balance() {
    // if master not initialized, don't run balancer.
    if (!this.initialized) {
      LOG.debug("Master has not been initialized, don't run balancer.");
      return false;
    }
    // If balance not true, don't run balancer.
    if (!this.balanceSwitch) return false;
    // Do this call outside of synchronized block.
    int maximumBalanceTime = getBalancerCutoffTime();
    long cutoffTime = System.currentTimeMillis() + maximumBalanceTime;
    boolean balancerRan;
    synchronized (this.balancer) {
      // Only allow one balance run at at time.
      if (this.assignmentManager.isRegionsInTransition()) {
        LOG.debug("Not running balancer because " +
          this.assignmentManager.getRegionsInTransition().size() +
          " region(s) in transition: " +
          org.apache.commons.lang.StringUtils.
            abbreviate(this.assignmentManager.getRegionsInTransition().toString(), 256));
        return false;
      }
      if (this.serverManager.areDeadServersInProgress()) {
        LOG.debug("Not running balancer because processing dead regionserver(s): " +
          this.serverManager.getDeadServers());
        return false;
      }

      if (this.cpHost != null) {
        try {
          if (this.cpHost.preBalance()) {
            LOG.debug("Coprocessor bypassing balancer request");
            return false;
          }
        } catch (IOException ioe) {
          LOG.error("Error invoking master coprocessor preBalance()", ioe);
          return false;
        }
      }

      Map<String, Map<ServerName, List<HRegionInfo>>> assignmentsByTable =
        this.assignmentManager.getAssignmentsByTable();

      List<RegionPlan> plans = new ArrayList<RegionPlan>();
      for (Map<ServerName, List<HRegionInfo>> assignments : assignmentsByTable.values()) {
        List<RegionPlan> partialPlans = this.balancer.balanceCluster(assignments);
        if (partialPlans != null) plans.addAll(partialPlans);
      }
      int rpCount = 0;  // number of RegionPlans balanced so far
      long totalRegPlanExecTime = 0;
      balancerRan = plans != null;
      if (plans != null && !plans.isEmpty()) {
        for (RegionPlan plan: plans) {
          LOG.info("balance " + plan);
          long balStartTime = System.currentTimeMillis();
          this.assignmentManager.balance(plan);
          totalRegPlanExecTime += System.currentTimeMillis()-balStartTime;
          rpCount++;
          if (rpCount < plans.size() &&
              // if performing next balance exceeds cutoff time, exit the loop
              (System.currentTimeMillis() + (totalRegPlanExecTime / rpCount)) > cutoffTime) {
            LOG.debug("No more balancing till next balance run; maximumBalanceTime=" +
              maximumBalanceTime);
            break;
          }
        }
      }
      if (this.cpHost != null) {
        try {
          this.cpHost.postBalance();
        } catch (IOException ioe) {
          // balancing already succeeded so don't change the result
          LOG.error("Error invoking master coprocessor postBalance()", ioe);
        }
      }
    }
    return balancerRan;
  }

 

 终于写完了,也感谢能看到这里的朋友,因为是回家重写的,有点糙,对不起了

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

分享到:
评论

相关推荐

    HBase源码分析

    总的来说,HBase的源码分析涉及到客户端与服务器的交互、RPC通信机制、数据存储流程以及系统架构等多个层面。理解这些核心机制对于优化HBase性能、排查问题以及进行二次开发都至关重要。通过对HBase源码的深入学习,...

    hbase源码分析

    - **初始化HMaster**:Master启动时,首先进行初始化操作。 - **创建RPC Server**:接着创建一个RPC Server,并启动它。 - **监听客户端请求**:启动一个监听线程,用于监听客户端的请求,并将这些请求放入NIO请求...

    藏经阁-HBase 和 OpenTSDB 在华为的实战.pdf

    1. 加速 HMaster 启动:华为采用了并行计算 Region-locality 的方法来加速 HMaster 的启动,同时还 detach 了 Region-locality 计算在启动时。 2. 系统表区域 asign:华为将系统表区域分配给 HMaster,以便在启动时...

    藏经阁-HBase and OpenTSDB Practices at Huawei.pdf

    Huawei 的实践和经验在 HBase 和 OpenTSDB 方面非常丰富,涵盖了加速 HMaster 启动、增强复制机制、迁移到 2.1.x 集群、加速数据摄取、增强数据压缩和提高查询性能等多个方面。这些实践和经验对于大数据时代的企业来...

    HBase实战源码

    源码分析是理解HBase工作原理和技术细节的重要途径。HBase在大数据领域扮演着关键角色,它能够处理海量数据并提供实时访问。下面,我们将深入探讨HBase的核心概念和源码中的关键组件。 1. **HBase架构**:HBase基于...

    hbase启动说明和python脚本

    1. 启动HMaster:HBase的主节点,负责区域服务器的监控和负载均衡。 2. 启动HRegionServer:区域服务器,实际存储数据并处理用户请求。 3. 启动ZooKeeper:HBase依赖ZooKeeper进行元数据管理和服务发现。 接下来,...

    hbase权威指南源码

    2. **HBase架构**:源码中可能包括HMaster、HRegionServer、Zookeeper等核心组件的实现,帮助理解HBase如何进行区域分配、故障恢复和集群管理。 3. **表和列族**:源码可能包含创建、修改和删除表的示例,以及对列...

    【No0057】HBase源码解析与开发实战.txt

    - 当集群启动时,HMaster 会初始化并监听 Zookeeper 上的状态变化。 - 同时,RegionServer 会注册到 HMaster,并接收来自 HMaster 的指令进行 Region 的分配与管理。 3. **Region的分裂与合并** - 当 Region 的...

    hbase源码包和测试用例

    HBase的源码分析有助于理解其内部工作原理。例如,`HRegionServer`是数据服务的主要组件,负责Region的管理和数据操作;`HMaster`负责Region的分配和负载均衡;`HStore`管理Column Family,包含一系列的`HStoreFile...

    安装HBase,并启动运行

    在Hadoop0上运行`start-hbase.sh`,然后使用`jps`命令检查进程,确保HMaster、HQuorumPeer和HRegionServer等进程正常运行。同样,你也应该在其他节点上看到HQuorumPeer和HRegionServer进程。 可以通过`hbase shell`...

    hbase-code-analysis:nosql数据库hbase的源码分析

    源码中,`org.apache.hadoop.hbase.master.HMaster`是Master的主要实现类,包含了一系列的管理服务。 总的来说,HBase的源码分析涵盖了分布式系统设计、数据存储与索引、并发控制、故障恢复等多个领域,对于理解大...

    最近很火的大数据Hadoop之Hbase0.99.2最新版源码

    通过对`HMaster`类的源码阅读,我们可以了解到这些关键功能的具体实现。 其次,RegionServer是HBase的存储和计算单元,它直接与客户端交互,执行数据的读写操作。`HRegionServer`类是其核心,包含了Region的生命...

    源码笔记资料(1).zip

    深入HBase源码有助于理解其内部工作原理,例如RegionServer如何处理客户端请求,HMaster如何管理Region分配,以及数据的存储格式等。源码中的关键类如HRegion、HFile、WAL等,都是值得深入研究的对象。 六、实战...

    HBase和OpenTSDB在华为的实战.pdf

    华为通过优化HMaster的启动过程,解决了区域局部性计算慢、串行计算、命名空间初始化失败等问题,使得HMaster能够更快地完成启动并加入集群的正常运行。 例如,华为利用并行计算加速了区域局部性的计算,同时通过将...

    hbase 1.2.0源码

    HBase 1.2.0是该数据库的一个稳定版本,包含了众多优化和改进,对于想要深入理解HBase工作原理或者进行大数据分析的学习者来说,研究其源码是非常有价值的。 一、HBase架构与核心概念 1. 表与Region:HBase中的...

    HABASE概述及安装.docx

    5. 启动HMaster服务和HRegionServer服务。 HBASE的安装 1. 首先,需要安装Java环境变量。 2. 然后,下载HBASE安装包,并解压缩。 3. 编辑HBASE的配置文件,包括hbase-env.sh和hbase-site.xml。 4. 启动HBASE,仅在...

    Hbase权威指南 随书源代码 源码包 绝对完整版

    通过分析源码,我们可以深入了解HBase如何实现这些功能,包括数据分布策略、数据模型、并发控制、故障恢复等。这对于开发和优化HBase应用,以及解决实际问题非常有帮助。同时,熟悉源码也能使我们更好地理解和应用...

    Hbase安装指南

    HBase利用Hadoop的HDFS(Hadoop Distributed File System)作为底层存储,同时借助Hadoop的MapReduce框架来处理大数据分析任务。此外,HBase还依赖Zookeeper进行协调和服务发现,确保系统的高可用性和一致性。 在...

Global site tag (gtag.js) - Google Analytics