Hardware resources
In order to deploy an HA cluster, you should prepare the following:
- NameNode machines - the machines on which you run the Active and Standby NameNodes should have equivalent hardware to each other, and equivalent hardware to what would be used in a non-HA cluster.
- JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager. Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.
Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode.
在非HA的集群中,还是需要Secondary NameNode,但是在HA集群中,就不需要Secondary NameNode了,有了还是个错误
相关推荐
- 监控Namenode和JournalNode的日志,及时发现和解决问题。 综上所述,解决Namenode启动失败的问题需要对Hadoop的内部机制有深入理解,包括元数据管理、配置参数以及故障恢复策略。通过理解这些知识点,可以更有效...
HDFS的体系结构包括 Namenode、Datanode、Secondary Namenode、Journalnode等。Namenode负责维护文件系统的目录结构,Datanode负责存储数据块。 数据的三范式 数据的三范式包括第一范式(无重复的列)、第二范式...
首先,HDFS HA涉及两个主要组件:NameNode和Secondary NameNode。NameNode是HDFS的元数据管理器,负责维护文件系统的命名空间和文件块映射信息。当NameNode出现故障时,会导致整个HDFS服务中断。为了解决这个问题,...
两个NameNode和奇数个JournalNode共同确保文件系统的高可用,同时ResourceManager也采用了双节点配置,通过这种方式构建了高可用的YARN(Yet Another Resource Negotiator)平台,以处理计算任务的提交和执行。...
Hadoop集群的配置需要注意多个方面,例如NameNode和JournalNode应尽量分散,减少单点故障风险;ResourceManager和NameNode通常部署在同一节点以减少网络延迟;NodeManager和DataNode部署在同一节点以优化数据传输...
在`hdfs-site.xml`中配置`dfs.namenode.secondary.http-address`来设置这个端口。 5. **ResourceManager**: 是YARN的资源调度器,负责管理集群资源分配。ResourceManager的HTTP监控界面可以通过`...
通过使用Quorum Journal Manager(QJM),Hadoop实现了对共享存储设备的高可用访问,这允许Active NameNode和Standby NameNode对共享数据进行日志追加,而不会造成数据冲突。QJM使用一组JournalNode节点来确保日志的...
50090 dfs.namenode.secondary.http-address,如:172.25.39.166:50090 50091 dfs.namenode.secondary.https-address,如:172.25.39.166:50091 50020 dfs.datanode.ipc.address 50075 dfs.datanode.http.address ...
6. 分别在不同节点启动HDFS和YARN相关服务,如HDFS的NameNode、Secondary NameNode、DataNode,以及YARN的ResourceManager、NodeManager等。 7. 最后启动历史任务服务器和ResourceManager。 五、查看集群 通过`jps`...
5. `masters`: 列出NameNode和Secondary NameNode的主机名(在HA模式下,Secondary NameNode不复存在,但这个文件仍然需要,可以为空或包含两台NameNode的主机名)。 6. `cluster.xml`: 定义NameService ID(如`name...
此属性用于配置NameNode的RPC通信地址,包括主动和备用节点的地址。例如: ``` <name>dfs.namenode.rpc-address.nnc1.nn1 <value>master1:9000 ``` 这里配置了`nn1`(即`master1`)的RPC地址为`9000`端口。 ###...
- dfs.namenode.secondary.http-address:为Secondary NameNode指定一个用于管理任务的HTTP地址。 - dfs.nameservices:用逗号分隔的NameService列表,用于高可用性配置。 - dfs.ha.namenodes.[nameserviceId]:为高...
- **NameNode和DataNode的作用**: - **NameNode**: 管理文件系统的命名空间树和元数据信息。 - **DataNode**: 存储实际的数据块,响应客户端的读写请求。 **2.2 MapReduce** - **基本概念和工作流程**: - **Map...
- 数据存储:NameNode、Secondary NameNode、JournalNode、HBase Master等。 - 资源管理:YARN中的ResourceManager、Node Manager等。 - 处理引擎:Spark、Impala、Search等。 - 安全:Sentry。 - 数据管理:...