`

hbase-region balancer

 
阅读更多

why

  along with the time goes,as unreasonable table design,or some crash nodes etc,these will cause some regionservers have overloaded regions and/or much underloaded regions,so a balancer is necessary to balance the load(regions count) per regionserver to avoid uncomplete resource usages.

 

how to 

  in hbase ,there is a balancer which can be run by manual or a period checker.for the later,the period will be adjusted by below property:

  <property>
    <name>hbase.balancer.period</name>
    <value>300000</value>
    <description>Period at which the region balancer runs in the Master.
    </description>
  </property>

 

look at the balance flow in HMaster, u will see a approx process like below:

1.generate region plans.that is which and where region to be moved.

2.close the region by regionserver notified by master

3.assign the closed region(s) by master in a normal procedure.

 

  1.generate region plans in master



  there is a slop called "hbase.regions.slop" to accommodate the balance accuracy.the less this value is,the more accurate results in.

 also,the ceil and floor are generated by the averge load * (1 +/- slop).

 but at last ,the calculation of underloaded regions and overloaded is crude:

   use region count / number of regionservers as the floor,and 

   use (region count-1+number of servers) / number of servers as the ceil

 

 

  2.close the region by regionserver

  when all region plans are all ready,then a loop will iterate per plan to send close info to regionserver.after the later closed the region and updated the zk state,then a zk-related event is received in master in AssignmentManager#nodeDataChanged(),so a normal region assign process is issued.

 

  3.assign the closed reigon to a new regionserver

  after finishing the step 2 above,but how the master knows which regionserver to assign?

  in the step 1,before sending a close info to regionserver,the master has kept up the region-plan(which contains the src and dest server),so when receiving a zk event,this plans info are stayed in memory in master also.

 

 FAQs:

rs的负载不均衡,已经有同事做了些改进,将同一table的region尽量分配到不同rs上

--decrease the slop factor below to 0.1,or adjust the crude min and max calculation.

hot region的均匀分布。考虑根据region最近所服务的请求数作为balance的依据,使每台rs上的region所服务的请求数相

--i think this is a design fault.in general ,a even-consistent algorithm will not result in this case.

  in the other hand,un-even region assigment will cause unconsistent resource usage.

 

? ignore zk event,let regionserver notify master to assign directly

--this is a improved feature in 0.96+?

 

? fix some regions in a regionserver,that is exclude new regions to be assigned to specified reginservers

--look at [3]

ref:

[1]5。hbase高级部分:compact/split/balance及其它维护原理--the flow of creating table

[2]改善HBase的Balance策略

[3] Support to drain RS nodes through ZK :suppress assignment of new regions

  • 大小: 58.4 KB
分享到:
评论

相关推荐

    HBase应用最佳实践详解.pdf

    其中,hdfs-site.xml文件用于配置HDFS的设置,hbase-env.sh文件用于配置HBase的环境变量,hbase-site.xml文件用于配置HBase的设置,regionservers文件用于配置RegionServer的主机列表。 HBase常用命令 HBase提供了...

    HBase 应用平台 balancer 功能

    例如,使用`hbase hbck -balance`命令可以强制执行一次负载均衡。同时,管理员也可以通过HBase的Web UI或JMX接口来监控和控制balancer的状态。 总之,HBase的`balancer`功能是保证集群高效运行的关键组件。它通过...

    HBase配置项说明及调优建议.zip

    7. **负载均衡**:`hbase.loadbalance.period`定义了RegionServer负载均衡检查的间隔。`hbase.regions.recovery.max.attempts`设定了Region恢复失败的最大尝试次数。 8. **监控与日志**:开启`hbase.regionserver....

    林昊 HBase简介与实践分享 .pptx

    - **Table Balance**:优化Region的分布策略。 - **Client Put Bug修复**:解决客户端Put操作中发现的问题。 - **Compaction优化**:提高Compaction过程的效率。 - **Master恢复时间缩短**:减少Master服务重启...

    月光宝盒双11-HBase集群应用和优化经验

    HBase集群应用和优化经验 ...A:通过增加重试机制、调整hbase.client.retries.number和hbase.client.pause配置选项、关闭Region自劢Balance功能、修改Rowkey结构等措施,解决Region短暂下线问题。

    hbase维护操作手册.docx

    使用`graceful_stop.sh`脚本安全关闭节点,并在数据迁移完成后执行`balance_switch true`进行数据平衡。 3. **集群一致性维护**:通过`hbck`命令检测和修复表和Region的一致性问题,确保数据的准确性和可用性。 ...

    hbase性能调优.pdf

    HBase 的性能调优需要考虑到多个因素,包括 Zookeeper 的 session timeout、RegionServer 的 IO 线程数和 Region 的大小。只有通过深入了解 HBase 的架构和机理,才能对这些配置项进行合理的调整,以提高 HBase 的...

    Kylin在贝壳的性能挑战和HBase优化实践

    2. HBase版本升级:升级到1.4.9以利用RSGroup功能实现重点表和数据表的计算隔离,同时关闭自动Balance功能,只在夜间业务低谷期进行短时开启。此外,通过Canary工具定期检查Region状态,一旦发现不可用情况立即报警...

    基于HadoopHBase的一淘搜索离线系统PPT课件.pptx

    - **HBase扩展开发**:为了适应大规模数据处理,进行了定制化开发,如Load Balance插件、Region Split/Merge插件和工具,扩展了ThriftServer API,增强MapReduce库支持,增加更多Metrics指标,并开发了多种...

    shell 命令行中操作HBase数据库实例详解

    - `balance_switch`:开启或关闭Region服务器的负载均衡开关。 - `replication`:管理HBase的复制功能。 在HBase中,表是由列族(Column Family)组成,列族在表创建时定义,每个列族可以包含无限数量的列(Column...

    HBase在淘宝主搜索的Dump中的性能调优

    平衡策略(Balance Strategy)直接影响到RegionServer间的负载均衡情况。不当的平衡策略会导致某些RegionServer负载过高,而其他服务器资源闲置,从而影响系统的整体性能。通过适时调整`hmaster`的平衡策略参数,...

    分布式技术实践-Pegasus背后的故事.pdf

    例如,外部客户端的压力可能导致Zookeeper(一个分布式协调服务)不可用,进而使整个HBase集群的Region Server陷入瘫痪。Pegasus的定位就是解决这些问题,提供更稳定的分布式存储服务,特别是对于心跳机制、副本一致...

    数据工程师培训题库 优质文档.docx

    11. **HMaster功能**:在HBase中,HMaster负责RegionServer的region分配、元数据存储、region的compact操作以及管理用户对表的操作,选项A、C和D正确。存储数据元信息通常是指HBase的Meta表,而不是HMaster。 12. *...

Global site tag (gtag.js) - Google Analytics