The following gives you a list to run through when you encounter problems with your cluster setup.
1.Basic setup checklist
This section provides a checklist of things you should confirm for your cluster, before going into a deeper analysis in case of problems or performance issues.
File handles.
HBase is a database, so it uses a lot of files at the same time. The default ulimit -n of 1024 on most Unix or other Unix-like systems is insufficient. Any significant amount of loading will lead to I/O errors stating the obvious: java.io.IOException: Too many open files. You may also notice errors such as the following:
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
The ulimit -n for the DataNode processes and the HBase processes should be set high. To verify the current ulimit setting you can also run the following:
$ cat /proc/<PID of JVM>/limits
You should see that the limit on the number of files is set reasonably high—it is safest to just bump this up to 32000, or even more. “File handles and process lim- its” on page 49 has the full details on how to configure this value.
TODO but i found this value is different from the one set in OS,and the later always is 4096(ie 4 times of OS's one)
DataNode connections(dfs.datanode.max.xcievers)
The DataNodes should be configured with a large number of transceivers—at least 4,096, but potentially more. There’s no particular harm in setting it up to as high as 16,000 or so. See “Datanode handlers” on page 51 for more infor- mation.
Not having this configuration in place makes for strange-looking failures. Eventually, you will see a complaint in the datanode logs about the xcievers limit being exceeded, but on the run up to this one manifestation is a complaint about missing blocks. For example:
10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
Compression. Compressionshouldalmostalwaysbeon,unlessyouarestoringprecom- pressed data. “Compression” on page 424 discusses the details. Make sure that you have verified the installation so that all region servers can load the required compression libraries. If not, you will see errors like this:
hbase(main):007:0> create 'testtable', { NAME => 'colfam1', COMPRESSION => 'LZO' } ERROR: org.apache.hadoop.hbase.client.NoServerForRegionException: \
No server address listed in .META. for region \ testtable2,,1309713043529.8ec02f811f75d2178ad098dc40b4efcf.
In the logfiles of the servers, you will see the root cause for this problem (abbreviated and line-wrapped to fit the available width):
2011-07-03 19:10:43,725 INFO org.apache.hadoop.hbase.regionserver.HRegion: \ Setting up tabledescriptor config now ...
2011-07-03 19:10:43,725 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: \ Instantiated testtable,,1309713043529.8ec02f811f75d2178ad098dc40b4efcf.
2011-07-03 19:10:43,839 ERROR org.apache.hadoop.hbase.regionserver.handler. \ OpenRegionHandler: Failed open of region=testtable,,1309713043529. \ 8ec02f811f75d2178ad098dc40b4efcf.
java.io.IOException: java.lang.RuntimeException: \
java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec
at org.apache.hadoop.hbase.util.CompressionTest.testCompression
at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs ...
Garbage collection/memory tuning. We discussed the common Java garbage collector set- tings in “Garbage Collection Tuning” on page 419. If enough memory is available, you should increase the region server heap up to at least 4 GB, preferably more like 8 GB. The recommended garbage collection settings ought to work for any heap size.
Also, if you are colocating the region server and MapReduce task tracker, be mindful of resource contention on the shared system. Edit the mapred-site.xml file to reduce the number of slots for nodes running with ZooKeeper, so you can allocate a good share of memory to the region server. Do the math on memory allocation, accounting for memory allocated to the task tracker and region server, as well as memory allocated for each child task (from mapred-site.xml and hadoop-env.sh) to make sure you are leaving enough memory for the region server but you’re not oversubscribing the system. Refer to the discussion in “Requirements” on page 34. You might want to consider separating MapReduce and HBase functionality if you are otherwise strapped for resources.
Lastly, HBase is also CPU-intensive. So even if you have enough memory, check your CPU utilization to determine if slots need to be reduced, using a simple Unix command such as top, or the monitoring described in Chapter 10.
2.Stability issues
In rare cases, a region server may shut itself down, or its process may be terminated unexpectedly. You can check the following:
• Double-checkthattheJVMversionisnot1.6.0u18(whichisknowntohavedet- rimental effects on running HBase processes).
• Check the last lines of the region server logs—they probably have a message con- taining the word "aborting" (or "abort"), hopefully with a reason.
The latter is often an issue when the server is losing its ZooKeeper session. If that is the case, you can look into the following:
2.1 ZooKeeper problems. It is vital to ensure that ZooKeeper can perform its tasks as the co- ordination service for HBase. It is also important for the HBase processes to be able to communicate with ZooKeeper on a regular basis. Here is a checklist you can use to ensure that your do not run into commonly known problems with ZooKeeper:
Check that the region server and ZooKeeper machines do not swap
If machines start swapping, certain resources start to time out and the region servers will lose their ZooKeeper session, causing them to abort themselves. You can use Ganglia, for example, to graph the machines’ swap usage, or execute
$ vmstat 20
in fact ,this is the theshold that the swap will be used when the memorey have been used for (1-20%) percent .
on the server(s) while running load against the cluster (e.g., a MapReduce job): make sure the "si" and "so" columns stay at 0. These columns show the amount of data swapped in or out. Also execute
$ free -m
to make sure that no swap space is used (the swap column should state 0). Also consider tuning the kernel’s swappiness value (/proc/sys/vm/swappiness) down to 5 or 10. This should help if the total memory allocation adds up to less than the box’s available memory, yet swap is happening anyway.
Check network issues
If the network is flaky, region servers will lose their connections to ZooKeeper and abort.
Check ZooKeeper machine deployment
ZooKeeper should never be codeployed with task trackers or data nodes. It is per- missible to deploy ZooKeeper with the name node, secondary name node, and job tracker on small clusters (e.g., fewer than 40 nodes).
It is preferable to deploy just one ZooKeeper peer shared with the name node/job tracker than to deploy three that are collocated with other processes: the other processes will stress the machine and ZooKeeper will start timing out.
Check pauses related to garbage collection
Check the region server’s logfiles for a message containing "slept"; for example, you might see something like "We slept 65000ms instead of 10000ms". If you see this, it is probably due to either garbage collection pauses or heavy swapping. If they are garbage collection pauses, refer to the tuning options mentioned in “Basic setup checklist” on page 471.
Monitor slow disks
HBase does not degrade well when reading or writing a block on a data node with a slow disk. This problem can affect the entire cluster if the block holds data from the META region, causing compactions to slow and back up. Again, use monitor- ing to carefully keep these vital metrics under control.
it is hard to find unless you use some disk check tools as hdpam
ref:
hbase-guide
相关推荐
这个主要是historian-optimized.js文件在搭建环境时候没有正常获取到,通过chrom浏览器的控制台可以查看到historian-optimized.js报错 解决方法: 将historian-optimized.js放到compiled目录下
A Touch-Optimized Web Framework jQuery Mobile is a HTML5-based user interface system designed to make responsive web sites and apps that are accessible on all smartphone, tablet and desktop devices. ...
AXP221s 是一款高度集成的电源系统管理芯片,针对单芯锂电池 ( 锂离子或锂聚合物 ) 且需要多路电源转换输出的应用,提供简单易用而又可以灵活配置的完整电源解决方案,充分满足多核应用处理器系统对于电源相对复杂而...
A animated GIF engine for iOS in Swift. Optimized for Multi-Image case..zip,An animated gif & apng engine for iOS in Swift. Have a great performance on memory and cpu usage.
OLSR(Open Shortest Path First - Optimized Link State Routing)是一种专门为移动自组织网络(MANETs)设计的开放源码的最优链路状态路由协议。um-olsr-0.8.8.tgz是一个包含OLSR协议实现的开源软件包,适用于对...
2. **MTOM/XOP**:支持Message Transmission Optimization Mechanism (MTOM) 和XOP (XML-binary Optimized Packaging),用于高效传输二进制数据。 3. **WS-Security**:集成了WS-Security规范,提供了安全的Web服务...
此外,dex2jar也支持处理Odex(Optimized DEX)文件,这是Android系统为了提高性能而对DEX文件进行优化后的格式。同时,对于含有多个DEX文件的大型应用(通常称为Multi-Dex),dex2jar也能有效地合并处理。 虽然dex...
压缩包中的`readme-.txt`文件通常包含工具的使用指南、版本信息和开发者给出的重要提示。确保在使用这些工具前仔细阅读此文件,以获取正确的使用方法和最新更新。 总的来说,`smali`和`baksmali`是Android逆向工程...
标题 "GA-optimized-neural-network-main.zip" 暗示了这是一个使用遗传算法(Genetic Algorithm, GA)优化的神经网络项目。这个压缩包可能包含了整个实现过程的代码、数据集以及相关的文档。遗传算法是一种模拟自然...
This interface is optimized for all storage solutions, attached using a variety of transports including PCI Express, Ethernet, InfiniBandTM, and Fibre Channel. The mapping of extensions defined in ...
标题中的"orc-core-1.2.2.zip"是一个开源软件库的压缩包,ORC(Optimized Row Columnar)是一种高效的数据存储格式,广泛用于大数据处理领域,特别是Hadoop生态系统。ORC Core是ORC项目的核心部分,它提供了一种高效...
只需要修改相应的约束条件就可以进行优化计算了(With a single particle swarm algorithm is optimizing reservoir operation, only need to modify the corresponding constraint conditions can were optimized ) ...
- **Short Reach**: They are optimized for short-distance transmission, typically within a campus or metropolitan area. - **Interoperability**: They ensure compatibility between different vendors' ...
- **MTOM/XOP**:支持MTOM(Message Transmission Optimization Mechanism)和XOP(XML-binary Optimized Packaging),用于优化大文件传输。 - **RESTful服务**:除了传统的SOAP服务,Axis2也可以支持RESTful Web...
historian-optimized.js放在compiled目录下 jquery那几个放在third-party/js目录下(没有的话自己建一个) \src\github.com\google\battery-historian\templates\base.html文件里的 ...
4. **MTOM/XOP**:支持Message Transmission Optimization Mechanism(MTOM)和XML-binary Optimized Packaging(XOP),用于高效传输二进制数据。 5. **WS-Security**:实现了Web服务安全标准,如WS-Security、WS-...
"Python库 | cortx_jupyter-0.1.59-py2-none-any.whl" 是一个专门为Python环境设计的库,它集成了CORTX(Continuously Optimized Reliable Transaction eXtreme)与Jupyter Notebook,为数据科学家和开发者提供了一...
a toolkit for the construction of highly optimized compilers, optimizers, and runtime environments. LLVM is open source software. You may freely distribute it under the terms of the license agreement...
本篇文章将深入探讨如何通过优化技术来解决这个问题,我们将以"react-optimized-list-demo-源码"为例,探讨React优化列表组件的具体实现。 一、虚拟DOM与Diff算法 React的核心理念是使用虚拟DOM(Virtual DOM)来...