`
wbj0110
  • 浏览: 1603584 次
  • 性别: Icon_minigender_1
  • 来自: 上海
文章分类
社区版块
存档分类
最新评论

Administering an HDFS High Availability Cluster

阅读更多

Using the haadmin command

Now that your HA NameNodes are configured and started, you will have access to some additional commands to administer your HA HDFS cluster. Specifically, you should familiarize yourself with the subcommands of the hdfs haadmin command.

This page describes high-level uses of some important subcommands. For specific usage information of each subcommand, you should run hdfs haadmin -help <command>.

failover - initiate a failover between two NameNodes

This subcommand causes a failover from the first provided NameNode to the second. If the first NameNode is in the Standby state, this command simply transitions the second to the Active state without error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one of the methods succeeds. Only after this process will the second NameNode be transitioned to the Active state. If no fencing method succeeds, the second NameNode will not be transitioned to the Active state, and an error will be returned.

getServiceState

getServiceState - determine whether the given NameNode is Active or Standby

Connect to the provided NameNode to determine its current state, printing either "standby" or "active" to STDOUT as appropriate. This subcommand might be used by cron jobs or monitoring scripts which need to behave differently based on whether the NameNode is currently Active or Standby.

checkHealth

checkHealth - check the health of the given NameNode

Connect to the provided NameNode to check its health. The NameNode is capable of performing some diagnostics on itself, including checking if internal services are running as expected. This command will return 0 if the NameNode is healthy, non-zero otherwise. One might use this command for monitoring purposes.

  Note:

The checkHealth command is not yet implemented, and at present will always return success, unless the given NameNode is completely down.

Using the dfsadmin command when HA is enabled

When you use the dfsadmin command with HA enabled, you should use the -fs option to specify a particular NameNode using the RPC address, or service RPC address, of the NameNode. Not all operations are permitted on a standby NameNode. If the specific NameNode is left unspecified, only the operations to set quotas (-setQuota-clrQuota-setSpaceQuota-clrSpaceQuota), report basic file system information (-report), and check upgrade progress (-upgradeProgress) will failover and perform the requested operation on the active NameNode. The "refresh" options (-refreshNodes-refreshServiceAcl-refreshUserToGroupsMappings, and -refreshSuperUserGroupsConfiguration) must be run on both the active and standby NameNodes.

Disabling HDFS High Availability

If you need to unconfigure HA and revert to using a single NameNode, either permanently or for upgrade or testing purposes, proceed as follows.
  Important:

If you have been using NFS shared storage in CDH 4, you must unconfigure it before upgrading to CDH 5. Only Quorum-based storage is supported in CDH 5. If you already using Quorum-based storage, you do not need to unconfigure it in order to upgrade.

Step 1: Shut Down the Cluster

  1. Shut down Hadoop services across your entire cluster. Do this from Cloudera Manager; or, if you are not using Cloudera Manager, run the following command on every host in your cluster:
    $ for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done
  2. Check each host to make sure that there are no processes running as the hdfsyarnmapred or httpfs users from root:
    # ps -aef | grep java

Step 2: Unconfigure HA

  1. Disable the software configuration.
    • If you are using Quorum-based storage and want to unconfigure it, unconfigure the HA properties described under Configuring Software for HDFS HA.

      If you intend to redeploy HDFS HA later, comment out the HA properties rather than deleting them.

    • If you were using NFS shared storage in CDH 4, you must unconfigure the properties described below before upgrading to CDH 5.
  2. Move the NameNode metadata directories on the standby NameNode. The location of these directories is configured by dfs.namenode.name.dir and/or dfs.namenode.edits.dir. Move them to a backup location.

Step 3: Restart the Cluster

for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x start ; done

Properties to unconfigure to disable an HDFS HA configuration using NFS shared storage

  Important:

HDFS HA with NFS shared storage is not supported in CDH 5. Comment out or delete these properties before attempting to upgrade your cluster to CDH 5. (If you intend to configure HA with Quorum-based storage under CDH 5, you should comment them out rather than deleting them, as they are also used in that configuration.)

Unconfigure the following properties.
  • In your core-site.xml file:

    fs.defaultFS (formerly fs.default.name)

    Optionally, you may have configured the default path for Hadoop clients to use the HA-enabled logical URI. For example, if you used mycluster as the NameService ID as shown below, this will be the value of the authority portion of all of your HDFS paths.

    <property>
      <name>fs.default.name/name>
      <value>hdfs://mycluster</value>
    </property>
  • In your hdfs-site.xml configuration file:

    dfs.nameservices

    <property>
      <name>dfs.nameservices</name>
      <value>mycluster</value>
    </property>
      Note:

    If you are also using HDFS Federation, this configuration setting will include the list of other nameservices, HA or otherwise, as a comma-separated list.

    dfs.ha.namenodes.[nameservice ID]

    A list of comma-separated NameNode IDs used by DataNodes to determine all the NameNodes in the cluster. For example, if you used mycluster as the NameService ID, and you used nn1 and nn2 as the individual IDs of the NameNodes, you would have configured this as follows:

    <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>nn1,nn2</value>
    </property>

    dfs.namenode.rpc-address.[nameservice ID]

    For both of the previously-configured NameNode IDs, the full address and RPC port of the NameNode process. For example:

    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn1</name>
      <value>machine1.example.com:8020</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn2</name>
      <value>machine2.example.com:8020</value>
    </property>
      Note:

    You may have similarly configured the servicerpc-address setting.

    dfs.namenode.http-address.[nameservice ID]

    The addresses for both NameNodes' HTTP servers to listen on. For example:

    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>machine1.example.com:50070</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.mycluster.nn2</name>
      <value>machine2.example.com:50070</value>
    </property>
      Note:

    If you have Hadoop's Kerberos security features enabled, and you use HSFTP, you will have set the https-address similarly for each NameNode.

    dfs.namenode.shared.edits.dir

    The path to the remote shared edits directory which the Standby NameNode uses to stay up-to-date with all the file system changes the Active NameNode makes. You should have configured only one of these directories, mounted read/write on both NameNode machines. The value of this setting should be the absolute path to this directory on the NameNode machines. For example:

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>file:///mnt/filer1/dfs/ha-name-dir-shared</value>
    </property>

    dfs.client.failover.proxy.provider.[nameservice ID]

    The name of the Java class which the DFS Client uses to determine which NameNode is the current Active, and therefore which NameNode is currently serving client requests. The only implementation which shipped with Hadoop is theConfiguredFailoverProxyProvider. For example:

    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    dfs.ha.fencing.methods - a list of scripts or Java classes which will be used to fence the Active NameNode during a failover.
      Note:

    If you implemented your own custom fencing method, see the org.apache.hadoop.ha.NodeFencer class.

    • The sshfence fencing method

      sshfence - SSH to the Active NameNode and kill the process

      For example:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
      </property>
      
      <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/exampleuser/.ssh/id_rsa</value>
      </property>
      Optionally, you may have configured a non-standard username or port to perform the SSH, as shown below, and also a timeout, in milliseconds, for the SSH:
      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence([[username][:port]])</value>
      </property>
      <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
        <description>
          SSH connection timeout, in milliseconds, to use with the builtin
          sshfence fencer.
        </description>
      </property>
    • The shell fencing method

      shell - run an arbitrary shell command to fence the Active NameNode

      The shell fencing method runs an arbitrary shell command, which you may have configured as shown below:
      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value>
      </property>
Automatic failover: If you configured automatic failover, you configured two additional configuration parameters.
  • In your hdfs-site.xml:
    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
  • In your core-site.xml file, add:

    <property>
      <name>ha.zookeeper.quorum</name>
      <value>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181</value>
    </property>

Other properties: There are several other configuration parameters which you may have set to control the behavior of automatic failover, though they were not necessary for most installations. See the configuration section of the Hadoop documentation for details.

Redeploying HDFS High Availability

If you need to redeploy HA using Quorum-based storage after temporarily disabling it, proceed as follows:

  1. Shut down the cluster as described in Step 1 of the previous section.
  2. Uncomment the properties you commented out in Step 2 of the previous section.
  3. Deploy HDFS HA, following the instructions under Deploying HDFS High Availability.

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-High-Availability-Guide/cdh5hag_hdfs_ha_admin.html

 

 

分享到:
评论

相关推荐

    Overview of Administering an Oracle Database

    在Oracle数据库管理中,了解和掌握关键概念是至关重要的,因为有效的管理对于系统的稳定性和数据安全性具有决定性作用。本文将深入探讨Oracle数据库管理员的角色、职责以及管理过程中的主要任务。...

    Patrick Wendell:Administering Spark

    在集群管理器(Cluster manager)方面,集群管理器为Spark应用提供执行器资源。驱动程序需要与工作节点保持完全的网络连通性以便调度任务。 Patrick Wendell提到了Spark应用的类型,包括长寿命/共享应用(如Shark,...

    Administering.VMware.Site.Recovery.Manager.5.0.pdf

    总之,《Administering VMware Site Recovery Manager 5.0》是一本非常实用的参考书,它不仅为读者提供了全面的技术指导,还包含了丰富的实战经验和最佳实践案例。无论是对于初学者还是经验丰富的 IT 专业人士来说,...

    Administering Microsoft Sql Server 2012 Databases

    Training Kit (Exam 70-462), Administering Microsoft Sql Server 2012 Databases, Thomas, Ward, Taylor, Ms Press

    Administering Windows Server 2012

    《管理Windows Server 2012》这本书详细介绍了Windows Server 2012的管理技术和知识。Windows Server 2012是微软公司推出的一款服务器操作系统,它在虚拟化、存储、网络、用户体验等方面都进行了大量的改进和创新。...

    Administering.ArcGIS.for.Server

    ### 知识点一:ArcGIS for...以上是对《Administering ArcGIS for Server》这本书中关键知识点的总结。通过这些内容的学习,可以帮助读者全面掌握 ArcGIS for Server 的安装、配置、发布、优化以及安全管理等核心技能。

    administering_arcsde_geodatabases

    arcsde 空间数据库教程,管理员手册。 字数补丁。

    Exam Ref 70-764 Administering a SQL Database Infrastructure epub

    Exam Ref 70-764 Administering a SQL Database Infrastructure 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    MySQL 8 Administrator’s Guide

    MySQL is one of the most popular ...By the end of this highly practical book, you will have all the knowledge you need to tackle any problem you might encounter while administering your MySQL solution.

    Using and Administering Linux.rar

    《使用与管理Linux》是针对Linux操作系统的一套教程,涵盖了从基础到高级的广泛主题,旨在帮助初学者迅速掌握Linux操作系统。本教程分为两卷,Volume 2和Volume 3,均提供高清文字版,确保读者可以清晰地阅读和学习...

    Exam Ref 70-764 Administering a SQL Database Infrastructure.pdf

    《Exam Ref 70-764 Administering a SQL Database Infrastructure》是一本由Victor Isakov编著的微软官方考试参考资料,旨在为参加微软认证解决方案专家(MCSE)SQL Server 2016数据库管理认证考试的人员提供准备。...

    Administering JIRA Application-7.3

    ### JIRA应用管理手册概述 JIRA是由Atlassian公司开发的一款强大的项目和问题跟踪工具,广泛应用于软件开发、客户服务和企业项目管理等领域。本手册是针对JIRA 7.3版本的管理应用指南,为IT管理员提供了安装、配置...

    Administering Windows Server 2008 (70-646))

    微软的Windows Server 2008系统自发布以来,就一直是IT专业人员学习和工作的焦点,尤其是在服务器管理领域。在进行Windows Server 2008服务器管理时,会涉及到一系列的专业知识点和管理技巧。根据提供的文件信息,...

    Exam Ref 70-764 Administering a SQL Database Infrastructure azw3

    Exam Ref 70-764 Administering a SQL Database Infrastructure 英文azw3 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    微软MCSA认证教材Exam Ref 70-411 administering windows server 2012

    标题“微软MCSA认证教材Exam Ref 70-411 administering windows server 2012”揭示了几个关键信息点。首先,提到“微软MCSA认证”,意味着教材是为准备获取微软认证系统管理员(Microsoft Certified Solutions ...

    Administering Windows Vista Security: The Big Surprises

    Administering Windows Vista Security: The Big Surprises &lt;br&gt;Mark MinasiandByron Hynes &lt;br&gt;An Inside Look at Windows Vista Security for Systems Administrators &lt;br&gt; Get an early start on ...

    70-411 Administering Windows Server 2012

    Server 2012 進階管理教學及微軟認證考試重點

Global site tag (gtag.js) - Google Analytics