`
Belinda407
  • 浏览: 34237 次
  • 性别: Icon_minigender_2
  • 来自: 北京
社区版块
存档分类
最新评论

Hbase配置优化(转)

 
阅读更多
We have been implementing our product to support real time queries on HBase(version 0.94.0 with hadoop-1.0.0) & to improve performance of read & write operation, I have tunned hadoop/hbase configuration.

I will try to summaries all exceptions got in CRUD operation with their corresponding configuration changes to resolve that.

  • LeaseExpiredException/UnknownScannerException/ScannerTimeoutException:
It occurs mostly as client is making very slow call to server & in next() batch call of scan doesn't find its lease & it throws the exception, To fix this increase the lease timeout.
About this more information, can be find here.
Exception: org.apache.hadoop.hbase.regionserver.LeaseException: lease '*************' does not exist
Changes in conf/hbase-site.xml:
<property>
   <name>hbase.regionserver.lease.period</name>
   <value>1200000</value>
</property>

<property>
   <name>hbase.rpc.timeout</name>
   <value>1200000</value>
</property>


  • Zookeeper Session TimeOut:
Session timeout occurs when server doesn't hear any heartbeat from client within timeout interval. It should be less and also JVM GC should be configured accordingly. By default it is 60 sec. I configured it to 20 sec. For more information, can find here:
Changes in conf/hbase-site.xml:
<property>
    <name>zookeeper.session.timeout</name>
    <value>20000</value>
</property>


  • RegionServer Handler Count:
The number of RPC listeners/threads that answers incoming requests to region server. It should be set according to memory allocated to region servers, otherwise out of memory exception occurs. By default it is 10 & very low in order to prevent users from killing their region servers when using large write buffers with a high number of concurrent clients. If you are using co-processors, than it should be increased. I configured it to 50. For more information, findhere:
Changes in conf/hbase-site.xml:
 <property>
    <name>hbase.regionserver.handler.count</name>
    <value>50</value>
  </property>


  • Zookeeper MaxClient Connection:
Number of concurrent connections may make to a single member of the ZooKeeper ensemble. It should set high to avoid zk connection loss issues. By default, it is set to 300. I configured it to 1000.
Exception: 
org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException$ConnectionLossException:KeeperErrorCode = ConnectionLoss for /hbase
Changes in conf/hbase-site.xml:
<property>
   <name>hbase.zookeeper.property.maxClientCnxns</name>
   <value>1000</value>
 </property>


  • Scanner Caching Size:
It tells the scanner how many rows to fetch at a time from the server, higher caching value makes scanner faster but eats up more memory. So it should configured based on allocated memory as well as it shouldn't configured too high as next() call takes more time than lease timeout. By default it is set to 1 & gives very poor results. We configured it to 100.
Changes in conf/hbase-site.xml:
<property>
    <name>hbase.client.scanner.caching</name>
    <value>100</value>
 </property>


  • Maximum HStore file Size:
If any one of a column families' HStoreFiles has grown to exceed this value, the hosting HRegion is split in two. In older version it is set to be 256MB but in 0.94.0 onwards it set to 10GB. It is better to high, so no split occur in between any CRUD operation. If in high region size, performance is poor than split manually according to your need. More information, can be find here:
Changes in conf/hbase-site.xml:
<property>
    <name>hbase.hregion.max.filesize</name>
    <value>10737418240</value>
 </property>


  • Major Compaction:
There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this. After a major compaction runs there will be a single StoreFile per Store, and this will help performance usually. But it rewrite all of the Stores data. So its better to do manually major compaction. By default it is set to 1 day. I set this property to 0 for disabling it and running manually on hbase shell "major_compact 'tableName'".
Changes in conf/hbase-site.xml:
<property> 
   <name>hbase.hregion.majorcompaction</name>
   <value>0</value>
 </property>


  • Memstore File Size:
All writes/updates goes to memstore, when the memstore size hits its defined flush size, it will be flushed to disk & mutate operations are blocked in either case of blockMultiplier*flushSize or blockingStoreFiles is reached. 
To stop blocking of update operation, we increased blockMultiplier(default 2) to 4 & blockingStoreFiles(number of storeFiles in a store, default 7) to 30. 
For more information, find here:
Changes in conf/hbase-site.xml: 
<property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>134217728</value>
  </property>
  <property>
    <name>hbase.hregion.memstore.block.multiplier</name>
    <value>4</value>
  </property>
  <property>
    <name>hbase.hstore.blockingStoreFiles</name>
    <value>30</value>
  </property>

  • HeapSize of HBase:
The maximum amount of heap that can be used by HBase, is 1000MB by default & it's very low. We configured it to 4000MB and for region servers 8000MB.
Exception: OutOfMemoryExceptions
Changes in conf/hbase-env.sh:
export HBASE_HEAPSIZE=4000
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xms8000m -Xmx8000m"

  • GC Configuration: 
Bigger heap size makes longer GC pauses. As we configured heap size as 8GB & if it takes 10sec/GB than it pauses for 80sec & it can throw zookeeper session timeout exception if it reached timeout. Longer GC pause disables server to send heartbeats & others will assume this server is dead. More information can be find here. 
Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded 
To avoid longer GC pause, we configured GC in such way:
Changes in conf/hbase-env.sh: 
export HBASE_OPTS="$HBASE_OPTS -server -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:NewSize=64m -XX:MaxNewSize=64m -XX:+CMSIncrementalMode -Djava.net.preferIPv4Stack=true"

  • Datanode Xcievers:
An upper bound on the number of files that it will serve at any one time. If it is not properly configured, we can get exceptions about missing blocks due to xcievers exceeded. More information, can be find here:
Exception: Could not obtain block blk_***_**** from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
Changes in conf/hdfs-site.xml:
 <property>
  <name>dfs.datanode.max.xcievers</name>
  <value>2048</value>
</property>
分享到:
评论

相关推荐

    Hbase配置所需要的配置文件.zip

    以下是对"**Hbase配置所需要的配置文件.zip**"中可能包含的配置文件及其作用的详细解释: 1. **hbase-site.xml**: 这是HBase的主要配置文件,包含了HBase集群的全局配置参数。例如,你可以在这里设置`hbase.rootdir...

    HBase性能优化方法总结

    二、配置优化 1. **Region大小调整**:根据数据量和访问模式调整Region大小,避免热点Region和过于分散的Region。 2. **Column Family设计**:合理规划列族,避免过多列族导致的元数据开销,同时根据访问模式设置...

    Hadoop集群(第12期副刊)_Hbase性能优化

    ——HBase性能优化 1、从配置角度优化 1.1 修改Linux配置 Linux系统最大可打开文件数一般默认的参数值是1024,如果你不进行修改并发量上来的时候会出现“Too Many Open Files”的错误,导致整个HBase不可运行,你...

    Hadoop2.7.1+Hbase1.2.1集群环境搭建(7)hbase 性能优化

    3. **HBase配置调整**:例如增大`hbase.hregion.max.filesize`以控制Region大小,调整`hbase.regionserver.handler.count`以增加处理线程数,或者优化`hbase.hregion.memstore.flush.size`以平衡内存和磁盘IO。...

    hbase性能优化

    除了上述提到的优化方法,HBase的性能优化还可以涉及更多的方面,如合理调整HBase配置、分区和负载均衡策略、压缩和存储优化、监控和诊断等。在HBase集群中,还可能通过调整HMaster和HRegionServer的相关参数来...

    HBase配置文件与HBase doc文档

    6. **性能调优**:涵盖JVM参数、HBase配置、硬件选择、数据分布策略等方面的优化技巧。 7. **监控和故障排查**:如何通过HBase的仪表盘和日志进行监控,以及在遇到问题时的排查方法。 8. **复制和备份**:HBase...

    HBase配置文件若干配置.zip

    在"HBase配置文件若干配置.zip"中,我们可能找到了与这些配置相关的模板或修改建议。 首先,`hbase.rootdir`是HBase的数据目录,它定义了HDFS上的路径,用于存储HBase的所有表数据和元数据。例如,可以设置为`hdfs:...

    hbase优化总结

    hbase优化总结 HBase 是一个基于列存储的 NoSQL 数据库,广泛应用于...HBase 的优化需要从多方面考虑,包括 Linux 系统、JVM 配置、HBase 配置等方面。通过合理的调整,可以提高 HBase 的性能,满足实际应用的需求。

    基于机器学习的HBase配置参数优化研究.pdf

    《基于机器学习的HBase配置参数优化研究》这篇文章探讨了一个重要的议题:如何利用机器学习技术对HBase数据库系统的配置参数进行优化。HBase是一个广泛应用于大数据处理的分布式数据库管理系统,尤其适用于需要快速...

    HBase配置

    1. **HBase配置文件** HBase的配置主要通过`hbase-site.xml`文件进行,此文件位于`$HBASE_HOME/conf`目录下。在这个XML文件中,你可以设置各种系统参数,如Zookeeper地址、HBase的根目录等。 2. **Zookeeper配置**...

    HBase最佳实践-读性能优化策略

    合理地进行优化配置,需要深入理解HBase的工作机制和数据存储模型,同时结合实际应用场景和业务需求,对各个参数进行微调,以达到最优的读取性能。在生产环境中,这种优化往往需要反复测试和调整,以实现系统性能的...

    Hbase性能优化百科全书(csdn)————程序.pdf

    在HBase性能优化的过程中,表设计和RowKey的设计是至关重要的。预分区是表设计的一个重要环节,目的是避免因表的自动split导致的资源消耗和性能影响。预分区可以根据业务需求预先设定rowkey的范围,比如在例子中,...

    Hadoop和Hbase 配置文件-完整好的

    通过这份详尽的配置文件集合,你可以快速搭建起一个基本的Hadoop和HBase环境,并根据实际工作负载进行优化。记住,每个参数的调整都可能影响到系统的性能、可用性和扩展性,因此在调整时要谨慎行事。

    HBase配置项说明及调优建议.zip

    这份“HBase配置项说明及调优建议”资料,旨在帮助用户理解HBase的核心配置参数,并提供实用的优化策略。 首先,我们要了解HBase的几个关键配置类别: 1. **Master节点配置**:Master节点负责管理表和Region的分配...

    HBase配置文件

    在搭建Hadoop框架中的HBase集群之前,理解并熟悉HBase的配置文件是至关重要的步骤。HBase是一款基于Google Bigtable理念设计的开源分布式数据库,它构建于Hadoop之上,适用于处理海量数据。HBase提供了高可靠性、高...

    HBase性能优化指南

    ### HBase性能优化知识点汇总 #### HDFS优化 - **存储机制**: HBase使用HDFS存储WAL(Write-Ahead Log)和HFiles。默认情况下,HDFS不会实时同步数据到磁盘,而是写入临时文件后移动到最终位置,导致在断电情况下...

    大数据开发之案例实践Hbase的设计及企业优化视频教程(视频+讲义+笔记+配置+代码+练习)

    │ Hbase性能优化-配置snappy压缩 │ Hbase中索引的介绍 │ PHoenix的编译及安装部署 │ PHoenix与Hbase表的关联使用 ├─03_笔记 │ [案例:Hbase的设计及企业优化].txt ├─04_代码 │ └─微博案例 ├─08_作业 ...

    Hbase配置.docx

    HBase是一种分布式、列式存储的NoSQL数据库,它构建在Hadoop文件系统(HDFS)之上,专门设计用于处理海量数据的实时读写操作。HBase的核心特性使其能够...通过合理的配置和优化,可以进一步提升HBase的性能和稳定性。

    HBase的性能优化

    "HBase性能优化" HBase是一种高性能的NoSQL数据库,广泛应用于大数据存储和处理领域。然而,HBase的性能优化是非常重要的,特别是在大...HBase性能优化是非常重要的,需要根据实际情况调整各种参数和配置来提高性能。

    hbase性能优化.pdf

    在HBase性能优化中,有两个关键的配置参数需要关注:`hbase.hregion.max.filesize`和`autoflush`设置。这些参数对HBase的写入性能、数据一致性和系统稳定性有显著影响。 1. `hbase.hregion.max.filesize`的设定: ...

Global site tag (gtag.js) - Google Analytics