`

Setting up Disks for Hadoop

 
阅读更多
Setting up Disks for Hadoop

Here are some recommendations for setting up disks in a Hadoop cluster. What we have here is anecdotal -hard evidence is very welcome, and everyone should expect a bit of trial and error work.

Key Points

Goals for a Hadoop cluster are normally massive amounts of data with high I/O bandwidth. Your MapReduce jobs may be IO bound or CPU/Memory bound -if you know which one is more important (effectively how many CPU cycles/RAM MB used per Map or Reduce), you can make better decisions.

Hardware

You don't need RAID disk controllers for Hadoop Data Node, as it copies data across multiple machines instead. This increase the likelihood that there is a free task slot near that data, and if the servers are on different PSUs and switches, eliminates some more points of failure in the data center.

While the Hadoop Name Node and Secondary Name Node can write to a list of drive locations, they will stop functioning if it can not write to ALL the locations. In this case a mirrored RAID is a good idea for higher availability.

Having lots of disks per server gives you more raw IO bandwidth than having one or two big disks. If you have enough that different tasks can be using different disks for input and output, disk seeking is minimized, which is one of the big disk performance killers. That said: more disks have a higher power budget; if you are power limited, you may want fewer but larger disks.

Configuring Hadoop

Pass a list of disks to the dfs.data.dir parameter, Hadoop will use all of the disks that are available. When one goes offline it is taken out of consideration. Hadoop does not check for the disk coming back -it assumes it is "gone".

How to limit Data node's disk usage?

Use dfs.datanode.du.reserved configuration value in $HADOOP_HOME/conf/hdfs-site.xml for limiting disk usage.


  <property>
    <name>dfs.datanode.du.reserved</name>
    <!-- cluster variant -->
    <value>182400</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>
Logging

The environment variable, HADOOP_LOG_DIR sets the directory Hadoop logs to.
the log4j.properties file in your hadoop configuration dir controls logging in more detail
Don't log to the root directory, as having a machine that does not boot because the logs are overflowing can be inconvenient.
Have a plan to clean up log output, otherwise jobs that log too much to the console will fill up log directories.
Get your developers to use the commons-logging APIs in their MapReduce code, so that you can turn logging up or down without recompiling the code. They can run in debug mode on their test machines, you can run at WARN level in production.
Some JVMs (JRockit) seem to log more. Tune your Log4j settings for your JVM, and only capture the stuff you really want.
Do not keep stuff under /tmp

Hadoop defaults to keeping things under /tmp so that you can play with Hadoop without filling up your disk. This is dangerous in a production cluster, as any automated cleanup cron job will eventually delete stuff in /tmp, at which point your Hadoop cluster is in trouble.
You will need cron job to clean stuff in /tmp up eventually. Plan for it.
Configure Hadoop to store stuff in stable locations, preferably off that root disk.
Java stores the info for jps under /tmp/hsperfdata_${user} -after the cleanup jps won't work. Have your script leave those directories alone, or get used to using ps -ef | grep java to find Java processes instead.
Underlying File System Options

If mount the disks as noatime, then the file access times aren't written back; this speeds up reads. There is also relatime, which stores some access time information, but is not as slow as the classic atime attribute. Remember that any access time information kept by Hadoop is independent of the atime attribute of individual blocks, so Hadoop does not care what your settings are here. If you are mounting disks purely for Hadoop, use noatime.

Formatting and tuning options are important. Using tunefs to set the reserve to zero percent can save you over 25 GigaBytes on a 1 TeraByte disk. Also the underlying file system is going to have many large files, you can get more space by lowering the number of inodes at format time.

Ext3

Yahoo! has publicly stated they use ext3. Regardless of the merits of the filesystem, that means that HDFS-on-ext3 has been publicly tested at a bigger scale than any other underlying filesystem that we know of.

XFS

From Bryan on the core-user list on 19 May 2009:

We use XFS for our data drives, and we've had somewhat mixed results. One of the biggest pros is that XFS has more free space than ext3, even with the reserved space settings turned all the way to 0. Another is that you can format a 1TB drive as XFS in about 0 seconds, versus minutes for ext3. This makes it really fast to kickstart our worker nodes. We have seen some weird stuff happen though when machines run out of memory, apparently because the XFS driver does something odd with kernel memory. When this happens, we end up having to do some fscking before we can get that node back online. As far as outright performance, I actually *did* do some tests of xfs vs ext3 performance on our cluster. If you just look at a single machine's local disk speed, you can write and read noticeably faster when using XFS instead of ext3. However, the reality is that this extra disk performance won't have much of an effect on your overall job completion performance, since you will find yourself network bottlenecked well in advance of even ext3's performance. The long and short of it is that we use XFS to speed up our new machine deployment, and that's it.
Ext4

The Ext4 Linux filesystem has delayed allocation of data which makes it handle unplanned server shutdowns/power outages less well than classic ext3. Consider turning off the delalloc option in /etc/fstab unless you trust your UPS.
分享到:
评论

相关推荐

    hadoop-2.7.2.rar

    在Hadoop 2.7.2中,HDFS引入了RAID(Redundant Array of Inexpensive Disks)功能,提高了数据的安全性。此外,它还支持快照功能,可以创建文件系统的快照以备不时之需,这对于数据分析和恢复操作非常有用。 ...

    hadoop-2.5.2.zip

    Hadoop 2.5.2引入了RAID(Redundant Array of Inexpensive Disks)功能,提高了数据冗余和容错能力。此外,它优化了NameNode的内存管理,降低了单个NameNode的压力。 2. **YARN(Yet Another Resource Negotiator)...

    Apache Hadoop 3.x state of the union and upgrade guidance

    Apache Hadoop YARN is the modern distributed operating system for big data applications. It morphed the Hadoop compute layer to be a common resource-management platform that can host a wide variety of...

    hadoop-2.7.0.zip

    在Hadoop 2.7.0中,HDFS进行了多方面的改进,如引入了RAID(Redundant Array of Inexpensive Disks)策略,增强了数据冗余和容错能力,以及提高了文件读写效率。 2. **YARN(Yet Another Resource Negotiator)**:...

    Hadoop2.4.0测试环境搭建

    Hadoop2.4.0测试环境搭建 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_installing_manually_boo k/content/rpm-chap1-11.html [bigdata@namenode1 scripts]$ pwd /home/bigdata/software/...

    A Case for Redundant Arrays of Inexpensive Disks (RAID)

    1987年,加州大学伯克利分校的Patterson,Gibson和Katz发表了一篇题为“A Case for Redundant Array of Inexpensive Disks(RAID)”的论文,讲述了RAID

    hadoop2.2.0API

    Hadoop 2.2.0 对HDFS进行了一些优化,例如引入了**Block Checksums**来增强数据完整性,通过**RAID**(Redundant Array of Inexpensive Disks)技术提供数据冗余和容错能力。此外,HDFS的**NameNode HA**(High ...

    hadoop版本差异详解.docx.doc

    1.0版本虽然稳定,但不支持文件追加(Append)功能,不具备RAID(Redundant Array of Independent Disks)来增强数据可靠性,也没有Symlink(符号链接)支持,更没有集成安全机制。 Hadoop 2.0(基于0.23.x和2.x...

    Hadoop-CCAH攻略,大数据平台必备文件

    #### 一、Cloudera Certified Administrator for Apache Hadoop (CCAH) **CCAH**是Cloudera认证为管理员提供的专业认证,旨在验证应试者在管理和维护Apache Hadoop集群方面的技能与知识。此认证覆盖了Hadoop生态...

    BSD Hacks.pdf

    13. **Protecting the Boot Process (HACK 25):** Techniques for securing the boot process, such as setting up passwords for bootloader configurations. 14. **Running a Headless System (HACK 26):** ...

    a memory-hierarchy-aware metadata management technique for solid state disks

    标题中提到的“a memory-hierarchy-aware metadata management technique for solid state disks”指的是对固态硬盘的元数据管理技术,这种技术需要对内存层次结构有所认知,以便更高效地处理数据。元数据是指关于...

    使用1.0.0 Azure ARM SDK for Java创建虚拟机报错1

    使用 1.0.0 版本的 Azure SDK for Java 创建虚拟机时,会使用 managed disks,但是中国区域的 1.0.0 SDK 还不支持 managed disks。这是因为 managed disks 是 Azure 云平台上的新功能,需要在中国区域上线。由于中国...

    Mask 98 for PRwin98

    - In Windows 3.1, set up a program item for Mask for Windows - PRWin98 in the StartUp group in Program Manager. When loaded, Mask redraws the desktop and all currently loaded, or selected, Windows...

    Solaris 10 System Administration Essentials

    10.2.5 Setting Up a System for Static Routing 296 10.2.6 Con?guring the Corporate Domain 300 10.2.7 Testing the Network Con?guration 302 10.3 Monitoring Network Performance 304 10.3.1 dladm Command ...

    centos7安装教程

    CentOS(Community ENTerprise Operating System)是一个基于Red Hat Enterprise Linux构建的免费企业级操作系统,它的主要目的是为用户提供一个稳定可靠的操作环境。CentOS 7是CentOS系列的一个重要版本,其发布于...

    uhdd.sys源码

    BIOS I O requests and caches data for up to 30 BIOS disks including A: or B: diskettes and including hard disks of any size UIDE can handle 48 bit LBA or 24 bit CHS I O calls by new or old DOS ...

    HP_when_good_disks_go_bad_wp1.pdf

    《当好磁盘变坏:HP-UX LVM下的磁盘故障处理》 在IT行业中,数据存储设备的稳定性与可靠性至关重要。磁盘作为数据存储的主要载体,其故障的发生不仅可能导致数据丢失,还可能引发系统崩溃,从而对业务连续性和企业...

    nodejs-disks:从服务器托管nodejs应用程序获取当前磁盘信息

    该存储库不再维护,使用时需要您自担风险。nodejs磁盘 从服务器托管nodejs... njds.drives( function (err, drives) { njds.drivesDetail( drives, function (err, data) { for(var i = 0; i&lt;data.length; i++) {

Global site tag (gtag.js) - Google Analytics