`

could only be replicated to 0 nodes, instead of 1

阅读更多
用三台Linux搭建hadoop环境时出错,master主机部分信息日志如下:
2015-03-28 17:13:12,147 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null
2015-03-28 17:13:12,147 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/home/hadoop/tmp/mapred/system/jobtracker.info" - Aborting...
2015-03-28 17:13:12,147 WARN org.apache.hadoop.mapred.JobTracker: Writing to file hdfs://master:10000/home/hadoop/tmp/mapred/system/jobtracker.info failed!
2015-03-28 17:13:12,147 WARN org.apache.hadoop.mapred.JobTracker: FileSystem is not ready yet!
2015-03-28 17:13:12,151 WARN org.apache.hadoop.mapred.JobTracker: Failed to initialize recovery manager.
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


初看还以为是dfs.replication的配置错误,其实不然,当看到
Error Recovery for null bad datanode[0] nodes == null
怀疑是原先启动时造成的数据缓存问题,于是清空hadoop.tmp.dir的数据并重启hadoop,访问http://master:50030和http://master:50070,正常显示页面,看起来算是成功解决了问题!

虽然50030和50070可以成功访问,但是其它2个slave节点使用 ps -ux命令时发现没有hadoop相关进程,这显然是不正常的,于是看了下save1的日志,如下:

2015-03-29 09:48:24,093 ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name. Using 'localhost'...
java.net.UnknownHostException: hsdb01: hsdb01: unknown error
at java.net.InetAddress.getLocalHost(InetAddress.java:1484)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.getHostname(MetricsSystemImpl.java:481)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSystem(MetricsSystemImpl.java:412)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:408)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:152)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:133)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:40)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1650)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
Caused by: java.net.UnknownHostException: hsdb01: unknown error
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302)
at java.net.InetAddress.getLocalHost(InetAddress.java:1479)
... 11 more


可见是/etc/hosts配置遗漏,需加入hsdb01和hsdb02。如下:
127.0.0.1  localhost
28.18.19.34 master root123
28.18.12.57 slave1 hsdb01
28.18.12.58 slave2 hsdb02


重启Hadoop
1、bin/hadoop namenode -format
2、bin/start-all.sh

访问50030和50070正常,可查看slave的日志时发现
2015-03-29 10:15:47,221 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/hadoop/tmp/dfs/data: namenode namespaceID = 1374430296; datanode namespaceID = 627398707
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)


关闭hadoop时信息如下:



产生此问题一般是由于两次或两次以上格式化namenode造成,解决方案借鉴了hadoop常见问题(2).no datanode to stop中提到修改namespaceID,问题最终解决。(*^__^*) 嘻嘻!

PS:本次我修改master的namespaceID后重启hadoop时错误依旧,所以我修改的是slave的namespaceID,但如果slave很多,维护可就耗时了!hadoop的集群部署一次性成功最好,否则真是"后患无穷"啊!呵呵!

Hadoop集群(第5期)_Hadoop安装配置
hadoop启动和运行中的error总结和处理方法

hadoop异常“could only be replicated to 0 nodes, instead of 1” 解决

下面这两种方法从书籍《Hadoop实战》第2版中看到,在此记录一下,在实际应用也可能会用到。
1、重启坏掉的DataNode或JobTrack。当hadoop集群的单个节点出现问题时,一般不必重启整个系统,只须重启这个节点,它会自动连入这个集群。
在坏死的节点上输入如下命令即可:

bin/hadoop-daemon.sh  start  datanode
bin/hadoop-daemon.sh  start  jobtracker

2、动态加入DataNode或JobTracker。下面这条命令允许用户动态地将某个节点加入到集群中。

bin/hadoop-daemon.sh  --config  ./conf  start  datanode
bin/hadoop-daemon.sh  --config  ./conf  start  tasktracker
  • 大小: 6.6 KB
分享到:
评论

相关推荐

    异常处理 could only be replicated to 0 nodes, instead of 1

    NULL 博文链接:https://xq0804200134.iteye.com/blog/1814655

    hadoop常见问题及解决方法

    6、启动时报错java.io.IOException: File jobtracker.info could only be replicated to 0 nodes, instead of 1: 解决方法:首先,检查防火墙是否关闭,是否对jobtracker.info文件进行了acl权限设置,或者是否已经...

    hadoop配置运行错误

    问题描述:在hadoop集群启动时,slave总是无法启动datanode,并报错“could only be replicated to 0 nodes, instead of 1”。 解决方法: 1. 删除所有节点的数据文件:删除所有节点的dfs.data.dir和dfs.tmp.dir...

    Hadopp集群运用过程中碰到的问题

    - **错误现象2:could only be replicated to 0 nodes, instead of 1** - 原因:HDFS的副本复制出现问题。 - 解决方法:在NameNode上执行`hadoop namenode –format`命令重新格式化HDFS,但在操作前,要删除...

    DB - Paxos Replicated State Machines as the Basis of a High

    DB - Paxos Replicated State Machines as the Basis of a High-Performance.pdf Conventional wisdom holds that Paxos is too expensive to use for high-volume, high-throughput, data-intensive applications....

    IBO v5.3.3 Build 1955(January 2, 2014) Full Source

    I fixed a bug where a dataset could be put out of search mode by the TIB_Grid calling ValidateRows and causing the dataset to open when it wasn't being expected to. I fixed a bug in the handling of a...

    Exchange Server 2010 Unleashed.pdf

    duction to Exchange Server 2010, not only from the perspective of a general tech- nology overview, but also to note what is truly new in Exchange Server 2010 that made it compelling enough for ...

    CCDE 400-007-en-unlocked.pdf

    - **Replication Type:** Synchronous data replication might be necessary to meet the business requirements, while asynchronous replication could be used to avoid performance impact on the primary site....

    Swift Game Development: Learn iOS 12 3rd Edition

    This new version allows us to integrate shared experiences such as multiplayer augmented reality and persistent AR that is tied to a specific location so that the same information can be replicated ...

    Geospatial.Development.By.Example.with.Python

    From Python programming good practices to the advanced use of analysis packages, this book teaches you how to write applications that will perform complex geoprocessing tasks that can be replicated ...

    A comprehensive study of Convergent and Commutative Replicated Data Types-计算机科学

    GThème COMINSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUEA comprehensive study of Convergent and Commutative Replicated Data TypesMarc Shapiro, INRIA & LIP6, Paris, France Nuno ...

    Kubernetes_in_Action

    PART 1: THE OVERVIEW 1 Introducing Kubernetes 2 First steps with Docker...15 Automatic scaling of pods & cluster nodes 16 Advanced scheduling 17 Best practices for developing apps 18 Extending Kubernetes

    Lustre HSM at Cambridge Early user experience using Intel Lemur

    11. **OpenStack**: Featuring 80 hypervisor nodes, 3 controller nodes, and 1 petabyte of replicated storage. #### Research Storage at Cambridge The RCS offers different storage options tailored to ...

    Rapid resynchronization for replicated storage

    为了解决这个问题,需要解决两个主要问题:1. 需要知道在节点A(存活节点)中在此期间修改了哪些块;2. 需要知道节点B(作为主节点崩溃并重新加入集群的节点)修改了哪些块。 重新同步是指在分布式存储系统中,当...

    背景音乐很好听,melodious, ethereal and celestial

    The background music is melodious, ethereal... The original time span of it is 3 minutes so it has to be replicated by applications to 40 minutes for a better stereophonic auditory experience to clients.

    ue4 通过ip连接Replicated server

    在UE4(Unreal Engine 4)中,创建和连接到Replicated服务器是网络多人游戏开发的核心部分。Replicated服务器是一种分布式系统,允许多个玩家通过网络连接到同一个游戏实例,实现同步的游戏体验。以下是关于“ue4...

Global site tag (gtag.js) - Google Analytics