`
bupt04406
  • 浏览: 348767 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

HDFS HBase NIO相关知识

 
阅读更多

HDFS的NIO有一些相关的知识偶尔需要注意下:

(1) 使用了堆外内存

Control direct memory buffer consumption by HBaseClient

https://issues.apache.org/jira/browse/HBASE-4956

 

standard hbase client, asynchbase client, netty and direct memory buffers  

https://groups.google.com/forum/?fromgroups=#!topic/asynchbase/xFvHuniLI1c

 

I thought I'd take a moment to explain what I discovered trying to track down serious problems with the regular (non-async) hbase client and Java's nio implementation.


We were having issues running out of direct memory and here's a stack trace which says it all:

        java.nio.Buffer.<init>(Buffer.java:172)
        java.nio.ByteBuffer.<init>(ByteBuffer.java:259)
        java.nio.ByteBuffer.<init>(ByteBuffer.java:267)
        java.nio.MappedByteBuffer.<init>(MappedByteBuffer.java:64)
        java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:97)
        java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
        sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:155)
        sun.nio.ch.IOUtil.write(IOUtil.java:37)
        sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        java.io.DataOutputStream.flush(DataOutputStream.java:106)
        org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:518)
        org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
        org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        $Proxy11.getProtocolVersion(<Unknown Source>:Unknown line)

Here you can see that an HBaseClient request is flushing a stream which has a socket channel at the other end of it. HBase has decided not to use direct memory for its byte buffers which I thought was smart since they are difficult to manage. Unfortunately, behind the scenes the JDK is noticing the lack of direct memory buffer in the socket channel write call, and it is allocating a direct memory buffer on your behalf! The size of that direct memory buffer depends on the amount of data you want to write at that time, so if you are writing 1M of data, the JDK will allocate 1M of direct memory.

The same is done on the reading side as well. If you perform channel I/O with a non-direct memory buffer, the JDK will allocate a direct memory buffer for you. In the reading case it allocates a size that equals the amount of room you have in the direct memory buffer you passed in to the read call. WTF!? That can be a very large value.

To make matters worse, the JDK caches these direct memory buffers in thread local storage and it caches not one, but three of these arbitrarily sized buffers. (Look in sun.nio.ch.Util.getTemporaryDirectBuffer and let me know if I have interpreted the code incorrectly.) So if you have a large number of threads talking to hbase you can find yourself overflowing with direct memory buffers that you have not allocated and didn't even know about.

This issue is what caused us to check out the asynchbase client, which happily didn't have any of these problems. The reason is that asynchbase uses netty and netty knows the proper way of using direct memory buffers for I/O. The correct way is to use direct memory buffers in manageable sizes, 16k to 64k or something like that, for the purpose of invoking a read or write system call. Netty has algorithms for calculating the best size given a particular socket connection, based on the amount of data it seems to be able to read at once, etc. Netty reads the data from the OS using direct memory and copies that data into Java byte buffers.

Now you might be wondering why you don't just pass a regular Java byte array into the read/write calls, to avoid the copy from direct memory to java heap memory, and here's the story about that. Let's assume you're doing a file or socket read. There are two cases:

  • If the amount being read is < 8k, it uses a native char array on the C stack for the read system call, and then copies the result into your Java buffer.
  • If the amount being read is > 8k, the JDK calls malloc, does the read system call with that buffer, copies the result into your Java buffer, and then calls free.

The reason for this is that the the compacting Java garbage collector might move your Java buffer while you're blocked in the read system call and clearly that will not do. But if you are not aware of the malloc/free being called every time you perform a read larger than 8k, you might be surprised by the performance of that. Direct memory buffers were created to avoid the malloc/free every time you read. You still need to copy but you don't need to malloc/free every time.

People get into trouble with direct memory because you cannot free them up when you know you are done. Instead you need to wait for the garbage collector to run and THEN the finalizers to be executed. You can never tell when the GC will run and/or your finalizers be run, so it's really a crap shoot. That's why the JDK caches these buffers (that it shouldn't be creating in the first place). And the larger your heap size, the less frequent the GCs. And actually, I saw some code in the JDK which called System.gc() manually when a direct memory buffer allocation failed, which is an absolute no-no. That might work with small heap sizes but with large heap sizes, a full System.gc() can take 15 or 20 seconds. We were trying to use the G1 collector which allows for very large heap sizes without long GC delays, but those delays were occurring because some piece of code was manually running GC. When I disabled System.gc() with a command line option, we ran out of direct memory instead.

 

http://hllvm.group.iteye.com/group/topic/27945

http://www.ibm.com/developerworks/cn/java/j-nativememory-linux/

http://www.kdgregory.com/index.php?page=java.byteBuffer

 

 

https://issues.apache.org/jira/browse/HADOOP-8069

In the Server implementation, we write with maximum 8KB write() calls, to avoid a heap malloc inside the JDK's SocketOutputStream implementation (with less than 8K it uses a stack buffer instead).

 

(2) 使用了比较多的文件句柄(fd)

 

http://www.zeroc.com/forums/bug-reports/4221-possible-file-handle-leaks.html

http://code.alibabatech.com/blog/experience_766/danga_memcached_nio_leak.html

https://issues.apache.org/jira/browse/HADOOP-4346

 

根据https://issues.apache.org/jira/browse/HADOOP-2346所说一个
一个selector takes up 3 fds: 2 for a pipe (used for {{wakeup()}, I guess) and for epoll().

 

 

$ jps
30255 DataNode
14118 Jps
$ lsof -p 30255 | wc -l
35163
$ lsof -p 30255 | grep TCP | wc -l
8117

$ lsof -p 30255 | grep pipe | wc -l
16994
 
$ lsof -p 30255 | grep eventpoll | wc -l
8114

8117 + 8114 + 16994 = 33225

$ jstack 30255 | grep org.apache.hadoop.hdfs.server.datanode.DataXceiver.run | wc -l
8115

测试环境DataNode出现有很多pipe和eventpoll

 

http://search-hadoop.com/m/JIkTGc7w1S1/+%2522Too+many+open+files%2522+error%252C+which+gets+resolved+after+some+time&subj=+Too+many+open+files+error+which+gets+resolved+after+some+time

For writes, there is an extra thread waiting on i/o. So it would be 3 
fds more. To simplify earlier equation, on the client side :

for writes :  max fds (for io bound load) = 7 * #write_streams
for reads  :  max fds (for io bound load) = 4 * #read_streams

The main socket is cleared as soon as you close the stream.
The rest of fds stay for 10 sec (they get reused if you open more streams meanwhile).

 

发现HFile很多,删除了一些无用文件后

$ lsof -p 30255 | grep pipe | wc -l
982
$ lsof -p 30255 | wc -l
3141

$ jstack 30255 | grep  org.apache.hadoop.hdfs.server.datanode.DataXceiver.run | wc -l
139
分享到:
评论

相关推荐

    hbase常见错误整理3年运维经验整理

    class org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, file:hdfs://nameservice1/hbase/data/default/RASTER/92ceb2d86662ad6d959f4cc384229e0f/i, class org.apache.hadoop.hbase.backup....

    2017最新大数据架构师精英课程

    3_java基础知识-循环-类型转换 4_循环-函数-数组-重载 5_多为数组-冒泡-折半-选择排序 6_oop-封装-继承-static-final-private 7_多态-接口-异常体系 8_适配器/ k% N! Y7 j/ |- c) O5 M' V6 S 9_多线程-yield-join-...

    Java开发者或者大数据开发者面试知识点整理.zip

    6. **分布式系统与云计算**:面试中可能会涉及到Zookeeper(分布式协调服务)、Kafka(消息队列)、HBase(分布式列式存储)以及云计算平台如AWS,Azure或Google Cloud的相关知识。 7. **算法与数据结构**:熟练...

    Java开发知识点、大数据技术应用和常见后端面试题总结.zip

    1. **Hadoop**:分布式存储系统HDFS,MapReduce编程模型,HBase和Hive等数据处理工具。 2. **Spark**:快速计算框架,Spark Core、Spark SQL、Spark Streaming和MLlib等模块的应用。 3. **Storm**:实时流处理系统...

    Hadoop-MindMap 技术全解

    DFSClient是HDFS的客户端组件,它通过网络协议与HDFS进行交互,使用NIO、RPC和Socket等技术。 MapReduce是Hadoop中用于处理大规模数据集的编程模型。它将计算任务分解为Map(映射)和Reduce(归约)两个阶段,并...

    Kafka+Flume-ng搭建

    - 通信基于NIO实现,保证了高吞吐量和低延迟。 #### 四、Kafka+Flume-ng搭建步骤 1. **安装与配置依赖**: - 在系统运行环境或Flume-ng的lib目录(例如`/usr/lib/flume-ng/lib/`)下添加必要的JAR包。 - `flume...

    面试题库(368题).zip

    Java面试题库通常涵盖了大量的知识点,从基础概念到高级技术,再到大数据相关的问题。这份"面试题库(368题).zip"文件显然包含了368个问题,旨在帮助求职者准备全面的Java面试。以下是根据描述和标签可能包含的一些...

    java 大数据资料

    根据提供的信息,我们可以总结出以下相关的大数据及Java架构师领域的关键知识点: ### 一、Hadoop **Hadoop** 是一个能够对大量数据进行分布式处理的软件框架,它能够可靠地存储和处理PB级别的数据。Hadoop的核心...

    2019最新BAT大数据面试题

    9. **Hadoop相关**:对于HDFS、YARN的理解,以及MapReduce的执行流程,包括split、map、shuffle、reduce等阶段,是考察大数据基础的重要部分。 10. **NoSQL数据库**:如HBase、Cassandra等,面试者应了解其基本概念...

    大数据面试宝典.rar

    1. **Hadoop企业面试题.xmind**:这部分内容可能涵盖了Hadoop生态系统的基础知识,包括HDFS(Hadoop Distributed File System)的原理和操作,MapReduce的执行模型,YARN资源调度器的工作机制,以及HBase、Hive、Pig...

    尚硅谷全套视频

    4. HBase:掌握NoSQL数据库HBase的使用,适合大规模半结构化数据存储。 5. 数据清洗与预处理:学习数据清洗技巧,如异常值处理、缺失值填充。 6. 数据分析与挖掘:运用统计学方法和机器学习算法进行数据建模和预测。...

    架构师之成长路线

    接触大数据处理框架(如Hadoop、Spark、Flink)及相关的存储系统(如HDFS、HBase)。 最后,除了技术知识,架构师还需要具备良好的沟通能力、问题解决能力和战略思维。不断学习新技术,关注行业动态,积极参与社区...

    【Java面试资料】-(机构内训资料)广州-唯品会-Java大数据开发工程师

    理解HDFS分布式文件系统、MapReduce编程模型,以及Spark的RDD、DataFrame、DataSet API,和Flink的DataStream、DataSet API,能帮助开发者实现高效的数据处理。 6. **数据结构与算法**:面试中常考的包括排序算法...

    BDP考试

    7. **数据存储与文件格式**:了解HDFS、HBase等分布式存储系统,以及CSV、JSON、Avro、Parquet等大数据常用的文件格式。 8. **数据分析与可视化**:掌握基本的数据分析方法,如描述性统计、回归分析、时间序列分析...

    9 构建流式计算卖家日志系统应用实践.docx

    HBase则是一个列式存储系统,它基于Hadoop的HDFS构建,适用于存储大量的结构化数据,并支持快速的列族查询。在卖家日志系统中,HBase负责存储历史日志数据,提供长期存储和冷数据检索能力,满足了对历史数据深度分析...

    分布式JAVA应用 基础与实践

    Java中的RPC框架如Hadoop的HDFS、Apache Thrift、Google的gRPC以及阿里巴巴的Dubbo等,都是分布式系统中不可或缺的组件。理解RPC的工作原理、服务发现机制、序列化和反序列化过程对于实现高效的跨网络调用至关重要。...

    Java_数据工程路线图.zip

    你需要熟悉类、对象、封装、继承、多态等面向对象编程概念,以及异常处理、集合框架(如ArrayList、LinkedList、HashMap等)、IO流和NIO、线程和并发。 2. **数据库管理**:数据工程师需要掌握SQL,用于查询和操作...

    java+大数据(1).pdf

    Java大数据学习路线旨在帮助开发者掌握Java编程基础以及大数据技术,为从事相关项目开发做准备。以下是一些关键知识点的详细说明: 1. **Java基础**(45天): - **Java网络编程**:理解Socket编程,TCP/IP协议,...

    大数据采集服务-Lancer系统设计与实践.docx

    客户端数据则通过HTTP(s)上报,经过Lancer-Gateway统一接收并写入Kafka缓冲区,最后分发到HDFS、HIVE、ES、HBASE等存储层。离线流则依赖于Sqoop实现数据库批量同步。 Flume作为数据网关层和分发层的基础,是一个...

    后端架构师技术图谱.docx

    - 网络连接类型(连接与短连接)和相关框架,如零拷贝(Zero-copy)和序列化(如Hessian、Protobuf)。 8. **数据库**: - 数据库基础理论,如设计的三大范式,MySQL的原理、InnoDB引擎、优化、索引(聚集与非...

Global site tag (gtag.js) - Google Analytics