`

hadoop 2.x-HDFS federation

 
阅读更多
 note,this article is focus on hadoop-2.5.1,it maybe a little different from hadoop-0.23.x.

Agenda:

 I:hdfs federation abstract

 II:implement

 III:other distributed namespaces scalability

 -------------------

I:hdfs federation abstract

  in hadoop-1.x or before(except 0.23.x),there is a couple of limitations on a namenode:

  1.single point of failure(spof,addressed by HA)

  2.too many files/blocks,causes the metadata size is out of namenode's memory usage

  3.single master(namenode),undertakes many tasks:meta management,resouce assigment,datanodes' heartbeats,file lcoations etc.so these will reduce the throughput of namenode

  4. only one namespace,no valid seperation/isolation between multi tenant,e.g.a development apps will interfere with the product  apps each other.

   ....

  so,a multi namespaces comes in to address all of above issues (except case 1).with the multi-namespaces,you can say as 'multi hierarchies name management' also,it will benefit from various scalabilities,and hdfs federation is implemented it in hadoop:

feature resolution scaling abstract
federation scalability horizontal union multi namespaces from multi clusters
ha availablity horizontal by adding some standby namenodes

  so if integrates federation with ha ,u will see this architecture:



                 hdfs federation + ha schema

 

 II.implementation

  like some linux mount disks,the it's easy to achive this schema:using Client Mount Table to get a global view of all federation namenodes!so it's simple and no signle-point-of-faiure for clients access it.and only make some changes in properties like this:

core-site.xml

 

<xi:include href=“cmt.xml"/>
<property>
    <name>fs.defaultFS</name>
    <value>viewfs://nsX</value>
    <description> </description>
</property>
   note:here uses authority 'viewfs' instead of hdfs.

and the cmt.xml 

<configuration><property><name>fs.viewfs.mounttable.nsX.link./share</name><value>hdfs://ns1/real_share</value></property><property><name>fs.viewfs.mounttable.nsX.link./user</name><value>hdfs://ns2/real_user</value></property></configuration>

 

  this means that in client the namespace '/share' is mapped to 'ns1' s real dir '/real_share' ,and the '/user' is similar to this.so we must create the real dir first:

hdfs dfs -mkdir hdfs://ns1/real_share

 

hdfs dfs -mkdir hdfs://ns2/real_user

  and for hdfs-site.xml is a little different from HA,see Hadoop 2.0 NameNode HA和Federation实践

 

 III.other distributed namespaces

  1. dynamic subtree patitioning(ceph)

  2.hash-based partitioning(Lustre) 

 

 IV.advantages vs shortcomings

  besides above advangates mentioned at first,the shared by multi-datacenter is very important to use federation;and the shortcomings are clear:

  a.not even data storage among different namespaces maybe occur;

 

ref:

hadoop 2.x-HDFS HA --Part I: abstraction

hortonworks hdfs federation

HDFS scalability with multiple namenodes

Hadoop 2.0 NameNode HA和Federation实践
Scaling HDFS Namenode using Multiple Namespace (Namenodes) and Block Pools 

 

  • 大小: 201 KB
分享到:
评论

相关推荐

    Hadoop_2.X_HDFS源码剖析_带索引书签目录_徐鹏

    《Hadoop_2.X_HDFS源码剖析》是由徐鹏编著的一本深入解析Hadoop 2.x版本中HDFS(Hadoop Distributed File System)源码的专业书籍。这本书旨在帮助读者理解HDFS的核心机制,提升在分布式存储系统方面的专业技能。 ...

    hadoop2.X新特性介绍

    ### Hadoop2.X 新特性详解 #### Hadoop1.0 的局限性 Hadoop1.0作为初代的大数据处理框架,在数据存储和处理方面取得了显著成就,但也暴露出了一系列问题,主要包括: - **HDFS(Hadoop Distributed File System)...

    Apache Hadoop2.x 安装入门详解 PDF

    2. HDFS Federation:通过增加多个NameNode,解决了单点故障问题,提高了可用性。 3. HA(High Availability)支持:为NameNode提供了热备份,确保服务连续性。 4. 更强的稳定性与性能优化:包括Block Size调整、网络...

    hadoop-2.2.0-src.tar

    Apache Hadoop 2.2.0 is the GA release of Apache Hadoop 2.x. Users are encouraged to immediately move to 2.2.0 since this release is significantly more stable and is guaranteed to remain compatible in...

    HDFS源码剖析带书签目录高清.zip

    《Hadoop 2.X HDFS源码剖析》以Hadoop 2.6.0源码为基础,深入剖析了HDFS 2.X中各个模块的实现细节,包括RPC框架实现、Namenode实现、Datanode实现以及HDFS客户端实现等。《Hadoop 2.X HDFS源码剖析》一共有5章,其中...

    HDFS Router-Based Federation Rebalancer.pdf_hdfs_

    《HDFS Router-Based Federation Rebalancer》是针对Hadoop分布式文件系统(HDFS)中联邦均衡器的一个深度探讨。在HDFS中,联邦是一种扩展性的实现方式,它允许多个独立的命名空间(NameSpaces)并存,每个命名空间...

    hdfs-site.xml配置文件详解

    hdfs-site.xml文件是Hadoop分布式文件系统(HDFS)的核心配置文件之一,它定义了HDFS的很多关键行为和属性。了解hdfs-site.xml的配置项对于调优Hadoop集群,满足特定需求是非常有帮助的。下面对hdfs-site.xml中的...

    hadoop-3.1.3-src.tar.gz

    - **HDFS Federation**:通过引入多个NameNodes,解决了单点故障和扩展性问题,支持更大规模的集群。 3. **Hadoop源代码分析** - **源码结构**:源代码中包含`hadoop-common`、`hadoop-hdfs`、`hadoop-mapreduce`...

    hadoop-core-0.20.2 源码 hadoop-2.5.1-src.tar.gz 源码 hadoop 源码

    3. **HDFS增强**:在Hadoop 2.x版本中,HDFS得到了很多增强,包括HA(High Availability)和 Federation,这些特性提高了HDFS的可用性和扩展性。 通过阅读和研究这些源码,开发者可以深入了解Hadoop的内部工作流程...

    hadoop 3.1.4

    5. **HDFS Federation**:通过增加多个命名空间,实现了 HDFS 的横向扩展,使得集群可以管理更多数据。 MapReduce 部分的改进包括: 1. **YARN(Yet Another Resource Negotiator)**:作为 MapReduce 的资源管理...

    大数据系列-Hadoop 2.0

    0103 高级Hadoop 2.x、0102 深入Hadoop 2.x这两部分可能涉及更深层次的Hadoop技术,如Hadoop生态系统的其他组件(如Hive、Pig、Spark),Hadoop的安全管理,以及高级优化技巧。 总之,Hadoop 2.0作为一个全面的...

    hadoop-3.1.4.tar.gz

    Hadoop 3.1.4是其一个重要版本,它在Hadoop 3.x系列中提供了许多增强功能和性能优化,包括对大数据处理的效率提升、资源管理的改进以及对硬件多样性的支持等。 一、Hadoop的核心组件 Hadoop主要由两个核心组件组成...

    hadoop 2.9.0 hdfs-default.xml 属性集

    其中HDFS(Hadoop Distributed File System)作为Hadoop项目的核心组件之一,负责数据的存储和管理。Hadoop 2.9.0版本的HDFS配置文件hdfs-site.xml定义了分布式文件系统的主要配置参数,下面详细说明这些属性的关键...

    Apache Hadoop 3.x state of the union and upgrade guidance

    Apache Hadoop YARN is the modern distributed operating system for big data applications....And you’ll leave with all the knowledge of how to upgrade painlessly from 2.x to 3.x to get all the benefits.

    1.Hadoop入门进阶课程_第1周_Hadoop1.X伪分布式安装.pdf

    - HDFSFederation 和 YARN。 - **新特性**: 2.x 版本新增了 NameNodeHA 和 Wire-compatibility。 #### 4. Hadoop 安装方式 - **单机模式**: - **特点**: 安装简单,无需额外配置。 - **用途**: 仅限于调试。 - ...

    apache hadoop 2.7.2.chm

    Compatibilty between Hadoop 1.x and Hadoop 2.x Encrypted Shuffle Pluggable Shuffle/Sort Distributed Cache Deploy MapReduce REST APIs MR Application Master MR History Server YARN Overview YARN ...

    Hadoop2.4.tgz下载

    Hadoop 2.4版本是一个重要的里程碑,它在Hadoop 2.x系列中引入了许多增强和优化,使得这个版本更加稳定且功能更加强大。 Hadoop的核心由两个主要组件构成:HDFS(Hadoop Distributed File System)和MapReduce。...

    Hadoop3.1.3.rar

    8. ** Federation**:Hadoop 3.1.3支持NameNode联邦,允许在一个集群中管理多个命名空间,从而扩展了HDFS的规模。 9. **改进的工具和API**:Hadoop 3.1.3提供了改进的命令行工具和API,使得开发人员和管理员能更...

    hadoop-2.7.7.tar.gz

    2. **配置Hadoop**:编辑conf目录下的配置文件如`core-site.xml`, `hdfs-site.xml`, 和 `yarn-site.xml`,设置相关参数,如NameNode和DataNode的地址、内存分配等。 3. **格式化NameNode**:首次启动Hadoop集群时,...

    hadoop-2.7.2.tar.gz

    3. Federation:NameNode Federation允许在一个Hadoop集群中部署多个NameNode实例,每个实例管理一部分命名空间,解决了单点瓶颈问题,提升了系统的扩展性。 4. Erasure Coding:2.7.2版本开始引入Erasure Coding,...

Global site tag (gtag.js) - Google Analytics