转自
http://wiki.apache.org/hadoop/NameNodeFailover
一. 在dfs.name.dir上添加nfs目录,
<property>
<name>dfs.name.dir</name> <value>/export/hadoop/namedir,/remote/export/hadoop/namedir</value>
</property>
如何挂在nfs参见
http://server.zdnet.com.cn/server/2007/0831/482007.shtml
http://www.cnblogs.com/mchina/archive/2013/01/03/2840040.html
** chmod 777 -R server端目录 **
二. 当namenode发生failure时
1. 备份nfs上的数据
2. 选取一台同一网络内的机器, 改变ip地址为namenode的ip,
3. 在这台机器上安装hadoop, 并copy之前的配置
4. 不要format这个node, 把nfs directory挂载在这台机器的同样位置下
5. 启动这个namenode
使用secondarynamenode恢复的方式参见
http://blog.csdn.net/jokes000/article/details/7703512
原文:
Introduction
As of 0.20, Hadoop does not support automatic recovery in the case of a NameNode failure. This is a well known and recognized single point of failure in Hadoop.
Experience at Yahoo! shows that NameNodes are more likely to fail due to misconfiguration, network issues, and bad behavior amongst clients than actual hardware problems. Out of fifteen grids over three year period, only three NameNode failures were related to hardware problems.
Configuring Hadoop for Failover
There are some preliminary steps that must be in place prior to performing a NameNode recovery. The most important is the dfs.name.dir property. This setting configures the NameNode such that it can write to more than one directory. A typcal configuration might look something like this:
<property>
<name>dfs.name.dir</name> <value>/export/hadoop/namedir,/remote/export/hadoop/namedir</value>
</property>
The first directory is a local directory and the second directory is a NFS mounted directory. The NameNode will write to both locations, keeping the HDFS metadata in sync. This allows for storage of the metadata off-machine so that one will have something to recover. During startup, the NameNode will pick the most recent version of these two directories to use and then sync both of them to use the same data.
After we have configured the NameNode to write to two or more directories, we now have a working backup of the metadata. Using this data, in the more common failure scenarios, we can use this data to bring the dead NameNode from the grave.
When a Failure Occurs
Now the recovery steps:
Just to be safe, make a copy of the data on the remote NFS mount for safe keeping.
Pick a target machine on the same network.
Change the IP address of that machine to match the NameNode's IP address. Using an interface alias to provide this address movement works as well. If this is not an option, be prepared to restart the entire grid to avoid hitting https://issues.apache.org/jira/browse/HADOOP-3988 .
Install Hadoop similarly to how you did the NameNode
Do not format this node!
Mount the remote NFS directory in the same location.
Startup the NameNode.
The NameNode should start replaying the edits file, updating the image, block reports should come in, etc.
At this point, your NameNode should be up.
分享到:
相关推荐
Hadoop HA是为了解决单点故障问题,通过引入NameNode的热备机制,确保即使主NameNode故障,系统仍能正常运行。它通常由两个NameNode组成:一个活动NameNode (Active NN) 和一个备用NameNode (Standby NN),两者实时...
在Hadoop生态系统中,高可用性(High Availability, HA)是一个关键特性,它确保了即使在主节点故障时,服务也能不间断地运行。本压缩包文件"hadop配置.zip"提供了一个简单的Hadoop高可用性(HA)配置参考,特别针对...
1. **HDFS高可用性**:HA是确保Hadoop集群即使在主NameNode故障时也能继续运行的关键特性。它通过引入Secondary NameNode(现在称为Standby NameNode)来实现,该节点与主NameNode并行运行,实时同步元数据状态。 2...
- **ZKFC (Zookeeper Failover Controller)**:运行在每个NameNode上,用于监控NameNode的状态并与Zookeeper进行交互,以实现在NameNode故障时进行自动切换。 2. **ResourceManager HA**:同样采用活跃/备用模式,...
- **ZKFC (Zookeeper Failover Controller)**:为了管理NameNode之间的故障转移,Hadoop使用了一个Zookeeper Failover Controller (ZKFC)。ZKFC监控Active NameNode的状态,并在出现问题时触发故障转移,将Standby ...
通过合理的配置和设置,我们可以实现NameNode的故障切换,避免单点故障带来的影响,提升整体集群的可靠性。在实际操作中,还应注意监控系统性能,定期检查配置的有效性,以确保数据的安全性和服务的稳定性。
这通常涉及到配置ZooKeeper Failover Controller (ZKFC)以实现NameNode的故障转移以及ResourceManager的高可用性。 总的来说,搭建Hadoop2完全分布式集群是一项需要细致规划和操作的工作,它要求管理员对Hadoop架构...
在构建Hadoop集群时,高可用(High Availability, HA)是一个关键特性,它旨在消除单点故障,确保服务的连续性和稳定性。本文将详细介绍如何在Hadoop 2.x版本中搭建HDFS-HA(Hadoop Distributed File System - High ...
而Hadoop HA(High Availability)则是为了确保系统在面临单点故障时仍能保持高可用性。以下是我根据自身经验总结的Hadoop HA安装流程,这个流程已经在实际工作中得到验证,可以确保稳定运行。 一、环境准备 1. ...
### Hadoop集群搭建知识点详解 #### 一、Hadoop集群简介 Hadoop是一个开源软件框架,主要用于分布式存储和处理大规模数据集。它基于Google的MapReduce论文和Google File System(GFS)论文设计而成,主要由HDFS...
7. **ZKFC (ZooKeeper Failover Controller)**: 在NameNode高可用配置中,ZKFC与ZooKeeper配合实现NameNode的故障切换。ZKFC的HTTP端口可以通过`http://<ZKFC_IP>:8019`访问,例如`192.168.1.106`。在`hdfs-site.xml...
### Hadoop 2.x 安装与配置详解 #### 一、准备工作 在开始Hadoop 2.x集群的搭建之前,需要确保以下条件...这些配置能够保证Hadoop集群具备高可用性、良好的容错能力和高效的性能表现,适合用于大规模数据处理场景。
ActiveNameNode负责对外提供服务,如处理客户端的RPC请求,而StandbyNameNode则负责同步ActiveNameNode的状态,以便在ActiveNameNode出现故障时能够迅速接管其职责。 为了实现这一点,需要一个共享存储系统来实时...
4. **Failover Controller**:每个NameNode都运行一个Failover Controller,负责监控自身状态并与其他NameNode通信,以实现平滑的故障转移。 5. **JournalNodes**:NameNode的元数据更改会写入JournalNodes,这样...
此外,还需要Journal Nodes (JNs) 来持久化编辑日志,避免单点故障,以及ZKFC (Zookeeper Failover Controller) 监控NameNode的状态,协助进行主备切换。 3. HDFS高可用的工作流程: - **NameNode状态切换**:在...
在 Hadoop 2.2.0 中,HA 配置主要包括 NameNode 的高可用性和故障转移机制。通过引入 ZooKeeper 作为选举工具,可以确保 NameNode 之间的数据同步以及故障切换的平滑过渡。 #### 六、总结 本文详细介绍了如何搭建 ...
其中,`fs.defaultFS`属性是最重要的,它指定了Hadoop的默认文件系统,通常是一个高可用的HDFS名称节点(NameNode)地址,如`hdfs://nameservice1`,这里的`nameservice1`是一个HDFS服务的逻辑命名空间,可以包括多...
在大数据处理领域,Hadoop作为一款开源软件框架,以其高效稳定的数据处理能力而受到广泛欢迎。而在实际生产环境中,为了确保系统的高可用性和数据的安全性,通常会采用Hadoop的High Availability (HA)配置模式来构建...