`

Hadoop安装遇到的各种异常及解决办法

阅读更多

异常一:

2014-03-13 11:10:23,665 INFO org.apache.Hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-03-13 11:10:24,667 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:25,667 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:26,669 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:27,670 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:28,671 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:29,672 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:30,674 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:31,675 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:32,676 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Linux-hadoop-38/10.10.208.38:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-03-13 11:10:32,677 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: Linux-hadoop-38/10.10.208.38:9000
解决办法:
1、ping Linux-hadoop-38能通,telnet Linux-hadoop-38 9000不能通,说明开启了防火墙
2、去Linux-hadoop-38主机关闭防火墙/etc/init.d/iptables stop,显示:
iptables:清除防火墙规则:[确定]
iptables:将链设置为政策 ACCEPT:filter [确定]
iptables:正在卸载模块:[确定]

3、重启 

异常二:

2014-03-13 11:26:30,788 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1257313099-10.10.208.38-1394679083528 (storage id DS-743638901-127.0.0.1-50010-1394616048958) service to Linux-hadoop-38/10.10.208.38:9000
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/tmp/dfs/data: namenode clusterID = CID-8e201022-6faa-440a-b61c-290e4ccfb006; datanode clusterID = clustername
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:916)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:887)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:309)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
at java.lang.Thread.run(Thread.java:662)
解决办法:
1、在hdfs-site.xml配置文件中,配置了dfs.namenode.name.dir,在master中,该配置的目录下有个current文件夹,里面有个VERSION文件,内容如下:
#Thu Mar 13 10:51:23 CST 2014
namespaceID=1615021223
clusterID=CID-8e201022-6faa-440a-b61c-290e4ccfb006
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1257313099-10.10.208.38-1394679083528
layoutVersion=-40
2、在core-site.xml配置文件中,配置了hadoop.tmp.dir,在slave中,该配置的目录下有个dfs/data/current目录,里面也有一个VERSION文件,内容
#Wed Mar 12 17:23:04 CST 2014
storageID=DS-414973036-10.10.208.54-50010-1394616184818
clusterID=clustername
cTime=0
storageType=DATA_NODE
layoutVersion=-40
3、一目了然,两个内容不一样,导致的。删除slave中的错误内容,重启,搞定!

参考资料: http://www.linuxidc.com/Linux/2014-03/98598.htm

异常三:

2014-03-13 12:34:46,828 FATAL org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Failed to initialize mapreduce_shuffle
java.lang.RuntimeException: No class defiend for mapreduce_shuffle
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.init(AuxServices.java:94)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.init(ContainerManagerImpl.java:181)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:185)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:328)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:351)
2014-03-13 12:34:46,830 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
java.lang.RuntimeException: No class defiend for mapreduce_shuffle
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.init(AuxServices.java:94)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.init(ContainerManagerImpl.java:181)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:185)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:328)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:351)
2014-03-13 12:34:46,846 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: ResourceCalculatorPlugin is unavailable on this system. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is disabled.
解决办法:
1、yarn-site.xml配置错误:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
2、修改为:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
3、重启服务

 

警告:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

解决办法:

参考  http://www.linuxidc.com/Linux/2014-03/98599.htm

异常四:

14/03/13 17:25:41 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
at java.lang.Runtime.loadLibrary0(Runtime.java:823)
at java.lang.System.loadLibrary(System.java:1028)
at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:67)
at com.hadoop.compression.lzo.LzoIndexer.<init>(LzoIndexer.java:36)
at com.hadoop.compression.lzo.LzoIndexer.main(LzoIndexer.java:134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
14/03/13 17:25:41 ERROR lzo.LzoCodec: Cannot load native-lzo without native-hadoop
14/03/13 17:25:43 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file /test2.lzo, size 0.00 GB...
Exception in thread "main" java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzopCodec.createDecompressor(LzopCodec.java:91)
at com.hadoop.compression.lzo.LzoIndex.createIndex(LzoIndex.java:222)
at com.hadoop.compression.lzo.LzoIndexer.indexSingleFile(LzoIndexer.java:117)
at com.hadoop.compression.lzo.LzoIndexer.indexInternal(LzoIndexer.java:98)
at com.hadoop.compression.lzo.LzoIndexer.index(LzoIndexer.java:52)
at com.hadoop.compression.lzo.LzoIndexer.main(LzoIndexer.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

解决办法:很明显,没有native-lzo
编译安装/编译lzo,http://www.linuxidc.com/Linux/2014-03/98601.htm

 

异常五:

14/03/17 10:23:59 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1394702706596_0003
java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:134)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:174)
at org.apache.hadoop.mapreduce.lib.input.TextInputFormat.isSplitable(TextInputFormat.java:58)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:276)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:468)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:485)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1287)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:127)
... 26 more
临时解决办法:
将/usr/local/hadoop/lib/hadoop-lzo-0.4.10.jar拷贝到/usr/local/jdk/lib下,重启linux

 

异常六:

14/03/17 10:35:03 ERROR security.UserGroupInformation: PriviledgedActionException as:Hadoop (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /tmp/hadoop-yarn/staging/hadoop/.staging/job_1395023531587_0001. Name node is in safe mode.
The reported blocks 0 needs additional 12 blocks to reach the threshold 0.9990 of total blocks 12. Safe mode will be turned off automatically.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2905)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /tmp/hadoop-yarn/staging/hadoop/.staging/job_1395023531587_0001. Name node is in safe mode.
The reported blocks 0 needs additional 12 blocks to reach the threshold 0.9990 of total blocks 12. Safe mode will be turned off automatically.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2905)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
at org.apache.hadoop.ipc.Client.call(Client.java:1238)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.delete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:418)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1287)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
解决办法:
明显是安全模式:
原因是reboot机器的时候,防火墙起来了,导致NodeManager启动不起来,导致一直是安全模式。关闭防火墙重启hadoop,ok!

 

异常七:

2014-03-20 10:35:10,447 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2014-03-20 10:35:10,450 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2014-03-20 10:35:10,450 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
2014-03-20 10:35:10,450 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
2014-03-20 10:35:10,476 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /usr/local/hadoop/hdfs/name/in_use.lock acquired by nodename 9580@Linux-hadoop-38
2014-03-20 10:35:10,479 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2014-03-20 10:35:10,480 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2014-03-20 10:35:10,480 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2014-03-20 10:35:10,480 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:217)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:728)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:521)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:613)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:598)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
2014-03-20 10:35:10,484 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2014-03-20 10:35:10,501 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
解决办法:
hadoop namenode -format,不是hdfs namenode -format

启动JobHistoryServer
sbin/mr-jobhistory-daemon.sh start historyserver

 

异常八:

Datanode denied communication with namenode: DatanodeRegistration 解决办法

 

Hadoop版本:2.2.0

单机伪同步环境。

 

启动start-dfs.sh后,发现本地没有datanode进程,查看datanode的日志如下:

2014-03-24 23:48:11,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1316134947-127.0.0.1-1395699640023 (storage id DS-2053126411-127.0.0.1-50010-1395699655564) service to localhost/192.168.1.101:9000 beginning handshake with NN

2014-03-24 23:48:11,381 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1316134947-127.0.0.1-1395699640023 (storage id DS-2053126411-127.0.0.1-50010-1395699655564) service to localhost/192.168.1.101:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-2053126411-127.0.0.1-50010-1395699655564, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-ba95b66c-d94b-4390-8d44-adb486ba5682;nsid=2073699254;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:739)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3929)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:948)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:24079)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)


at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:695)
2014-03-24 23:48:11,382 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1316134947-127.0.0.1-1395699640023 (storage id DS-2053126411-127.0.0.1-50010-1395699655564) service to localhost/192.168.1.101:9000
2014-03-24 23:48:11,484 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1316134947-127.0.0.1-1395699640023 (storage id DS-2053126411-127.0.0.1-50010-1395699655564)
2014-03-24 23:48:11,484 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-1316134947-127.0.0.1-1395699640023 from blockPoolScannerMap
2014-03-24 23:48:11,484 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-1316134947-127.0.0.1-1395699640023
2014-03-24 23:48:13,485 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2014-03-24 23:48:13,486 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2014-03-24 23:48:13,487 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at localhost/127.0.0.1

************************************************************/

导致这个问题的原因很多,在stackoverflow上有很多解决方案,但都不适用我。经过摸索,通过修改/etc/hosts解决。

/etc/hosts原来为:

 

127.0.0.1  localhost

由于我本机IP地址为 192.168.1.101,所以将hosts文件修改为这个ip地址:

 

 192.168.1.101 localhost

理论上127.0.0.1也是可行的,但不知道我的为什么不行。

 

修改后,重新删除dfs.namenode.name.dir和dfs.datanode.data.dir以及hadoop.tmp.dir目录,分别执行:

hdfs namenode -format

start-dfs.sh

start-yarn.sh

后,用jps查看服务,全部正常:

 

37619 NodeManager

37798 Jps

37247 NameNode

37330 DataNode

37536 ResourceManager

37432 SecondaryNameNo

转:http://blog.csdn.net/lifuxiangcaohui/article/details/39269479

分享到:
评论

相关推荐

    hadoop集群遇到的问题及其解决方法

    ### Hadoop集群遇到的问题及其解决方法 #### 异常一:DataNode无法连接到NameNode **问题描述:** 在Hadoop集群部署过程中,经常会出现DataNode无法成功连接到NameNode的情况,导致集群无法正常启动。 **原因分析...

    成功安装-安装hadoop遇到的问题

    这篇博文"成功安装-安装hadoop遇到的问题"可能提供了一些在实际操作中可能会遇到的难点和解决方案。虽然没有具体的描述内容,但我们可以根据常见的安装问题进行深入探讨。 首先,Hadoop的安装通常涉及到以下几个...

    云计算Hadoop平台的异常数据检测算法研究.pdf

    然而,在处理海量数据时,Hadoop平台常会遇到异常数据的挑战,这包括数据逻辑错误、数据链完整性缺失以及数据失效等问题。这些问题的出现严重干扰了云计算平台的数据运算准确性。 面对这些挑战,研究者们提出了针对...

    Hadoop MapReduce作业卡死问题的解决方法.docx

    ### Hadoop MapReduce作业卡死问题的解决方法 #### 一、问题背景 在使用Hadoop MapReduce进行大规模数据处理的过程中,遇到了一个棘手的问题——部分MapReduce作业长时间卡死,严重影响了系统的运行效率和资源利用...

    hadoop.dll 资源包

    5. **故障排查**:提供一些常见的错误代码和解决办法,帮助用户诊断和修复问题。 在处理hadoop.dll时,用户应遵循这些说明,以避免可能出现的错误。如果遇到问题,建议首先查看日志文件,找出错误的根源,然后根据...

    hadoop-commond(hadoop.dll)各个版本.rar

    在本地运行Spark时,如果Hadoop版本不匹配,可能会导致各种错误,例如类找不到异常、版本冲突等问题。因此,确保`hadoop.dll`与Spark和Hadoop的其他组件版本匹配至关重要。 4. **版本管理**:在处理多个版本的`...

    widowsXP安装hadoop步骤及问题

    #### 常见问题及其解决办法 1. **错误 `java.lang.NoClassDefFoundError`:** - 这个错误通常出现在尝试格式化 NameNode 或运行其他 Hadoop 命令时,表明 Hadoop 未能找到所需的类。 - 解决方案是检查 `hadoop-env...

    hadoop.dll 文件

    6. **检查错误日志**:如果遇到问题,查看Hadoop的日志文件,它们通常会提供错误的具体原因和解决建议。 了解了`hadoop.dll`和`winutils.exe`的作用以及如何在Windows环境下正确配置和使用它们,我们可以更顺利地在...

    winutils2.8.4-hadoop2.8.4

    这篇文章将详细介绍`winutils.exe`的作用,以及如何解决使用Hadoop API在HDFS上下载文件时遇到的问题。 首先,`winutils.exe`是Hadoop在Windows环境下运行所必需的二进制工具。Hadoop主要设计在Linux系统上运行,...

    hadoop运行wordcount实例

    #### 二、配置Hadoop过程中遇到的问题及解决方案 在配置Hadoop的过程中,可能会遇到以下常见问题及其解决方法: 1. **Java环境问题**: - 错误提示:“java: no such file or directory”。 - 解决方案:确保...

    window环境下需要hadoop.dll

    在Windows环境中,Hadoop是一个分布式计算...因此,理解这些依赖关系和解决方法对于在Windows上成功部署和运行Hadoop作业至关重要。在进行故障排查时,保持耐心,详细检查每一个可能的问题来源,通常可以解决这类问题。

    Hadoop-NativeIO.java

    在Windows环境中,NativeIO 使用特定的方法如 `access0` 来处理这些操作,但可能会遇到异常,表明本地系统与Hadoop的交互出现了问题。 描述中的异常 "org.apache.hadoop.io.nativeio.NativeIO$Windows.access0...

    hadoop2.7.3+mahout0.9问题集

    在本文中,我们将深入探讨Hadoop 2.7.3与Mahout 0.9集成过程中可能遇到的问题,以及如何解决这些技术挑战。Hadoop是一个开源的分布式计算框架,而Mahout是基于Hadoop的数据挖掘库,专注于机器学习算法。这两者的结合...

    hadoop-2.8.5所需jar

    如果没有这些库,Java编译器无法识别Hadoop相关的类和方法,从而导致编译失败或者在运行时抛出ClassNotFoundException、NoClassDefFoundError等异常。 Hadoop是一个开源的分布式计算框架,主要由HDFS(Hadoop ...

    hadoop故障分析与技术方案揭秘

    最后,文档提到总结和感悟部分,虽然没有提供具体内容,但可以推断其内容可能涉及对Hadoop集群管理中遇到的共性问题和解决方案的总结,以及对于故障处理经验的分享。 整体而言,文档涉及了Hadoop集群的日常管理、...

    Hadoop学习文档

    本文档详细介绍了如何在CentOS 5.6环境下搭建Hadoop集群的基础环境,包括JDK的安装配置以及解决过程中可能遇到的问题。通过这些步骤,用户可以在自己的环境中快速部署好Hadoop平台,为进一步的大数据分析打下坚实的...

    hadoop-common-2.2.0-bin_32bit_&_64bit

    当这两个文件不在Hadoop安装目录的bin路径下时,可能会导致空指针异常或其他运行时错误。 标签中的"wordcount"是指Hadoop的标志性示例程序,它统计输入文本中每个单词出现的次数,展示了Hadoop MapReduce的基本工作...

    hadoop.dll-winutils.exe

    在使用Apache Hadoop进行分布式计算时,经常遇到与`hadoop.dll...通过以上步骤,你应该能够成功解决Hadoop在Windows环境下运行MapReduce时遇到的`hadoop.dll`和`winutils.exe`相关问题,从而顺利运行Word Count等示例。

    完整版大数据云计算课程 Hadoop数据分析平台系列课程 Hadoop 05 Hadoop API开发 共32页.pptx

    如果在处理过程中遇到异常,会使用计数器记录错误行数,帮助调试和优化程序。 【课程总结】 通过本课程的学习,学员不仅能够理解MapReduce的工作原理,还能实际编写MapReduce程序,解决实际的大数据处理问题。这为...

Global site tag (gtag.js) - Google Analytics