`

hadoop常见问题

 
阅读更多

收集记录一些Hadoop配置部署过程中遇到的问题。

 

1.

Q:safe mode issue

2013-12-10 17:20:46,399 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310, call delete(/app/hadoop/tmp/mapred/system, true) from 127.0.0.1:59760: error: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /app/hadoop/tmp/mapred/system. Name node is in safe mode.
The ratio of reported blocks 1.0000 has reached the threshold 0.9990. Safe mode will be turned off automatically in 17 seconds.
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /app/hadoop/tmp/mapred/system. Name node is in safe mode.
The ratio of reported blocks 1.0000 has reached the threshold 0.9990. Safe mode will be turned off automatically in 17 seconds.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1994)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1974)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:792)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
2013-12-10 17:20:53,868 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON. 

S:

bin/hadoop dfsadmin -safemode leave

这种方法解决了运行中的hadoop的safe mode问题,但是下次重启hadoop,还会出现这个问题。

其实这个问题,我猜测可能是由于目录/app/hadoop/tmp/mapred/system被破坏造成。

永久解决,可以删除掉/app/hadoop/tmp/,重新创建,重新format,重启hadoop——如果条件允许的话。

 

2. namespaceID issue

Q:

2013-12-09 19:37:19,796 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 346059444; datanode namespaceID = 313579633
	at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
	at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)

2013-12-09 19:37:19,819 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 

 S:这个可能是由于某些操作造成了name节点和data节点的namespaceID不一致。

解决方法可以参照(这个方法没有实际实践过,暂时先记录下):

I've had this happen a few times. If restarting the data node doesn't help, then do the following:

Restart Hadoop
Go to /app/hadoop/tmp/dfs/name/current
Open VERSION (i.e. by vim VERSION)
Record namespaceID
Go to /app/hadoop/tmp/dfs/data/current
Open VERSION (i.e. by vim VERSION)
Replace the namespaceID with the namespaceID you recorded in step 4.
This should fix the problem.

from:http://stackoverflow.com/questions/18300940/why-does-data-node-shut-down-when-i-run-hadoop 

 

我实际使用的方法非常暴力,直接将name节点和data节点的目录(/app/hadoop/)删掉,

然后,重新创建这个目录,重新formatf(bin/hadoop namenode -format),重新启动hadoop。

这种方法不适合生产环境,学习阶段是足够了。

http://www.cnblogs.com/Dreama/articles/2097200.html

 

3.ipv6 issue

Q:ipv6 issue

http://wiki.apache.org/hadoop/HadoopIPv6

据说,hadoop不支持ipv6,所以禁用掉ipv6

hduser@hdnamenode:/usr/local/hadoop$ netstat -plten | grep java
(并非所有进程都能被检测到,所有非本用户的进程信息将不会显示,如果想看到所有信息,则必须切换到 root 用户)
tcp6       0      0 127.0.0.1:54310         :::*                    LISTEN      1001       21535       7799/java       
tcp6       0      0 127.0.0.1:54311         :::*                    LISTEN      1001       22111       8387/java       
tcp6       0      0 :::50090                :::*                    LISTEN      1001       22211       8291/java       
tcp6       0      0 :::50060                :::*                    LISTEN      1001       22864       8633/java       
tcp6       0      0 :::50030                :::*                    LISTEN      1001       22295       8387/java       
tcp6       0      0 :::37518                :::*                    LISTEN      1001       21549       8291/java       
tcp6       0      0 :::41359                :::*                    LISTEN      1001       20832       7799/java       
tcp6       0      0 127.0.0.1:51827         :::*                    LISTEN      1001       22582       8633/java       
tcp6       0      0 :::50070                :::*                    LISTEN      1001       22121       7799/java       
tcp6       0      0 :::51161                :::*                    LISTEN      1001       21726       8387/java       
tcp6       0      0 :::38905                :::*                    LISTEN      1001       21172       8045/java       
tcp6       0      0 :::50010                :::*                    LISTEN      1001       22207       8045/java       
tcp6       0      0 :::50075                :::*                    LISTEN      1001       22593       8045/java       
tcp6       0      0 :::50020                :::*                    LISTEN      1001       22598       8045/java  

如果是ipv6,则使用netstat查看端口会显示为“tcp6”,如果是ipv4,则显示为“tcp”

在ubuntu 12.04上禁用ipv6,可以使用以下方法:

关闭ipv6就可以了,关闭方法:对于ubuntu 9.10及以上版本 ,可用以下方法:

  1. gksu gedit /etc/default/grub 
  
将
     GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
  变为
     GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 quiet splash"
  2.  sudo update-grub

 修改之后,要reboot server。

在查看一下

hduser@hdnamenode:/usr/local/hadoop$ netstat -plten | grep java
(并非所有进程都能被检测到,所有非本用户的进程信息将不会显示,如果想看到所有信息,则必须切换到 root 用户)
tcp       0      0 127.0.0.1:54310         :::*                    LISTEN      1001       21535       7799/java       
tcp       0      0 127.0.0.1:54311         :::*                    LISTEN      1001       22111       8387/java       
tcp       0      0 :::50090                :::*                    LISTEN      1001       22211       8291/java       
tcp       0      0 :::50060                :::*                    LISTEN      1001       22864       8633/java       
tcp       0      0 :::50030                :::*                    LISTEN      1001       22295       8387/java       
tcp       0      0 :::37518                :::*                    LISTEN      1001       21549       8291/java       
tcp       0      0 :::41359                :::*                    LISTEN      1001       20832       7799/java       
tcp       0      0 127.0.0.1:51827         :::*                    LISTEN      1001       22582       8633/java       
tcp       0      0 :::50070                :::*                    LISTEN      1001       22121       7799/java       
tcp       0      0 :::51161                :::*                    LISTEN      1001       21726       8387/java       
tcp       0      0 :::38905                :::*                    LISTEN      1001       21172       8045/java       
tcp       0      0 :::50010                :::*                    LISTEN      1001       22207       8045/java       
tcp       0      0 :::50075                :::*                    LISTEN      1001       22593       8045/java       
tcp       0      0 :::50020                :::*                    LISTEN      1001       22598       8045/java  

 

4. issue:“jobtracker.info could only be replicated to 0 nodes, instead of 1”

2013-12-10 17:22:51,999 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310: starting
2013-12-10 17:22:52,003 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310: starting
2013-12-10 17:22:52,004 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310: starting
2013-12-10 17:22:52,005 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310: starting
2013-12-10 17:22:52,005 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310: starting
2013-12-10 17:22:52,006 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310: starting
2013-12-10 17:22:52,006 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310: starting
2013-12-10 17:22:52,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310: starting
2013-12-10 17:22:52,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310: starting
2013-12-10 17:22:52,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310: starting
2013-12-10 17:22:52,016 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: starting
2013-12-10 17:22:56,038 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.io.IOException: File /app/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
2013-12-10 17:22:56,040 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310, call addBlock(/app/hadoop/tmp/mapred/system/jobtracker.info, DFSClient_-1488093128, null) from 127.0.0.1:59783: error: java.io.IOException: File /app/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /app/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

S:这个问题可以有多种原因造成,但是归根到是由于data节点连不上name节点造成的。

我们还可以检查下data节点的log来验证下:

2013-12-10 17:31:57,434 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 0 time(s).
2013-12-10 17:31:58,435 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 1 time(s).
2013-12-10 17:31:59,437 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 2 time(s).
2013-12-10 17:32:00,439 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 3 time(s).
2013-12-10 17:32:01,440 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 4 time(s).
2013-12-10 17:32:02,442 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 5 time(s).
2013-12-10 17:32:03,444 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 6 time(s).
2013-12-10 17:32:04,446 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 7 time(s).
2013-12-10 17:32:05,448 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 8 time(s).
2013-12-10 17:32:06,450 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 9 time(s).
2013-12-10 17:32:06,451 INFO org.apache.hadoop.ipc.RPC: Server at hdnamenode/10.0.2.7:54310 not available yet, Zzzzz...

一直在retry 

 

下面说说我的一次解决问题的经历。

其实是很无语的。

当时一开始配置的是单机版的hadoop,然后在此基础上配置的多节点hadoop,core-site.xml中

 

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

 是这样配置的,这在单机版上没有问题。

当配置多节点时,没太在意,只改了data节点的core-site.xml,没改name节点的core-site.xml。

就是这个不在意造成了data节点连接不上name节点。

 

当时想,name节点,就我自己访问我自己,没关系的。

其实,不然,如果name节点的core-site.xml配成localhost会造成54310端口只能被本机访问,而不能被data节点访问。

 

hduser@hdnamenode:/usr/local/hadoop$ netstat -plten | grep java
(并非所有进程都能被检测到,所有非本用户的进程信息将不会显示,如果想看到所有信息,则必须切换到 root 用户)
tcp        0      0 127.0.0.1:54310         0.0.0.0:*               LISTEN      1001       12417       2347/java       
tcp        0      0 127.0.0.1:54311         0.0.0.0:*               LISTEN      1001       12405       2930/java       
tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      1001       12705       2839/java       
tcp        0      0 127.0.0.1:47373         0.0.0.0:*               LISTEN      1001       12426       3178/java       
tcp        0      0 0.0.0.0:53485           0.0.0.0:*               LISTEN      1001       11865       2930/java       
tcp        0      0 0.0.0.0:50030           0.0.0.0:*               LISTEN      1001       12519       2930/java       
tcp        0      0 0.0.0.0:42673           0.0.0.0:*               LISTEN      1001       11761       2347/java       
tcp        0      0 0.0.0.0:50070           0.0.0.0:*               LISTEN      1001       12525       2347/java       
tcp        0      0 0.0.0.0:48024           0.0.0.0:*               LISTEN      1001       11762       2591/java       
tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN      1001       12701       2591/java       
tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN      1001       12709       2591/java       
tcp        0      0 0.0.0.0:54972           0.0.0.0:*               LISTEN      1001       11763       2839/java   

注意54310和54311端口。

这个我们可以使用telnet验证:

 

hduser@hdnamenode:/usr/local/hadoop$ telnet 10.0.2.7 54310
Trying 10.0.2.7...
telnet: Unable to connect to remote host: Connection refused
hduser@hdnamenode:/usr/local/hadoop$ telnet 127.0.0.1 54310
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
exit
Connection closed by foreign host.
hduser@hdnamenode:/usr/local/hadoop$ telnet 10.0.2.7 22
Trying 10.0.2.7...
Connected to 10.0.2.7.
Escape character is '^]'.
SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1.1
e
Protocol mismatch.
Connection closed by foreign host.

 当我们将name节点的core-site.xml改成实际的ip时,将是这样:

 

hduser@hdnamenode:/usr/local/hadoop/conf$ netstat -plten | grep java
(并非所有进程都能被检测到,所有非本用户的进程信息将不会显示,如果想看到所有信息,则必须切换到 root 用户)
tcp        0      0 127.0.0.1:34054         0.0.0.0:*               LISTEN      1001       17197       5142/java       
tcp        0      0 10.0.2.7:54310          0.0.0.0:*               LISTEN      1001       16262       4291/java       
tcp        0      0 10.0.2.7:54311          0.0.0.0:*               LISTEN      1001       16856       4892/java       
tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      1001       17007       4791/java       
tcp        0      0 0.0.0.0:39084           0.0.0.0:*               LISTEN      1001       15558       4291/java       
tcp        0      0 0.0.0.0:50030           0.0.0.0:*               LISTEN      1001       17091       4892/java       
tcp        0      0 0.0.0.0:44687           0.0.0.0:*               LISTEN      1001       16415       4791/java       
tcp        0      0 0.0.0.0:32785           0.0.0.0:*               LISTEN      1001       16523       4892/java       
tcp        0      0 0.0.0.0:50070           0.0.0.0:*               LISTEN      1001       16916       4291/java       
tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN      1001       16997       4541/java       
tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN      1001       17208       4541/java       
tcp        0      0 0.0.0.0:40002           0.0.0.0:*               LISTEN      1001       15899       4541/java       
tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN      1001       17213       4541/java       
hduser@hdnamenode:/usr/local/hadoop/conf$ telnet 10.0.2.7 54310
Trying 10.0.2.7...
Connected to 10.0.2.7.
Escape character is '^]'.
exit
Connection closed by foreign host.

 

另外,有些时候,name节点启动的会慢于data节点,也会造成这个问题。这只要耐心等一会即可

 

总结下遇到这样的问题,如何考虑

1)检查name节点是否正常,查看端口是否被监听——注意ip是否正确

2)检查data节点能否telnet name节点相应的端口

3)检查conf/下的配置是否正确

4)检查ntfs空间是否足够??以及读写权限是否正确。

5)向google寻求帮助

下面的链接列举了一些情况:
http://blog.csdn.net/foamflower/article/details/5980406

 

5.“Incomplete HDFS URI” issue

Q:新问题

name节点:

2013-12-10 19:01:28,881 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: ReplicationMonitor thread received InterruptedException.java.lang.InterruptedException: sleep interrupted
2013-12-10 19:01:28,882 INFO org.apache.hadoop.hdfs.server.namenode.DecommissionManager: Interrupted Monitor
java.lang.InterruptedException: sleep interrupted
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65)
	at java.lang.Thread.run(Thread.java:662)
2013-12-10 19:01:28,883 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 
2013-12-10 19:01:28,998 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310
2013-12-10 19:01:28,998 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310: exiting
2013-12-10 19:01:28,998 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310: exiting
2013-12-10 19:01:28,998 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310: exiting
2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310: exiting
2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310: exiting
2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310: exiting
2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: exiting
2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 54310
2013-12-10 19:01:29,000 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2013-12-10 19:01:29,000 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Incomplete HDFS URI, no host: hdfs://hadoop_namenode:54310
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:85)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1310)
	at org.apache.hadoop.fs.FileSystem.access$100(FileSystem.java:65)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1328)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:109)
	at org.apache.hadoop.fs.Trash.<init>(Trash.java:62)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.startTrashEmptier(NameNode.java:292)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:288)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)

2013-12-10 19:01:29,000 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2013-12-10 19:01:29,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310: exiting
2013-12-10 19:01:29,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310: exiting
2013-12-10 19:01:29,028 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call delete(/app/hadoop/tmp/mapred/system, true) from 10.0.2.11:35266: output error
2013-12-10 19:01:29,028 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310 caught: java.nio.channels.ClosedByInterruptException
	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:343)
	at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1695)
	at org.apache.hadoop.ipc.Server.access$2000(Server.java:93)
	at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:739)
	at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:803)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1411)

2013-12-10 19:01:29,028 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310: exiting
2013-12-10 19:01:29,030 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop_namenode/10.0.2.11
************************************************************/

  data节点:

2013-12-10 19:01:55,577 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.0.2.11:54310. Already tried 9 time(s).
2013-12-10 19:01:55,579 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.ConnectException: Call to /10.0.2.11:54310 failed on connection exception: java.net.ConnectException: Connection refused
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:1057)
	at org.apache.hadoop.ipc.Client.call(Client.java:1033)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
	at $Proxy5.register(Unknown Source)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:635)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1378)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1438)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:414)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:527)
	at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:187)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1164)
	at org.apache.hadoop.ipc.Client.call(Client.java:1010)
	... 7 more

2013-12-10 19:01:55,580 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop_datanode2/10.0.2.13
************************************************************/

 

S:修改主机名,移除主机名中的下划线。~_~

然后运行

sudo /etc/init.d/networking restart

有可能,还需要reboot和删除logs

 

参考:

http://blog.csdn.net/xiaochawan/article/details/8733094

 

6.hbase的hbase.rootdir配置成ip的问题

Q:

2014-01-04 16:29:10,026 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.0.2.11:54310/hbase, expected: hdfs://hadoopnamenode:54310
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:354)
	at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:521)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:692)
	at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:238)
	at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:106)
	at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:91)
	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346)
	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
2014-01-04 16:29:10,027 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

S:

~/conf/hbase-site.xml中hbase.rootdir的配置不能使用ip,要使用hostname

 http://www.cnblogs.com/ventlam/archive/2011/01/22/HBaseCluster.html

 

7.hbase各节点间时间同步问题

Q: 由于各节点的时间不同步,会导致hbase无法正常启动。这时查看日志,将会看到以下警告信息。

2014-01-04 19:41:46,436 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin
2014-01-04 19:41:47,847 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apa

S: 

将各节点的时间同步。

http://lookqlp.iteye.com/blog/1341118

最好的做法是安装ntp,将name节点作为ntp server,将其他节点作为ntp client。

并且在启动各个节点时,等name节点完全启动后,再启动其他的节点。

 

8. zookeeper没有创建myid 时的报错

Q: 

hduser@hadoopnamenode:~/zookeeper-3.3.3$ bin/zkServer.sh start
JMX enabled by default
Using config: /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg
Starting zookeeper ... 
bin/zkServer.sh: 80: bin/zkServer.sh: cannot create /export/crawlspace/mahadev/zookeeper/server1/data
/var/zookeeper//zookeeper_server.pid: Directory nonexistent
STARTED
hduser@hadoopnamenode:~/zookeeper-3.3.3$ 2014-01-05 11:16:25,256 - INFO  [main:QuorumPeerConfig@90] - Reading configuration from: /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg
2014-01-05 11:16:25,280 - INFO  [main:QuorumPeerConfig@310] - Defaulting to majority quorums
2014-01-05 11:16:25,282 - FATAL [main:QuorumPeerMain@83] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg
	at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:110)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:99)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
Caused by: java.lang.IllegalArgumentException: /var/zookeeper/myid file is missing
	at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:320)
	at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:106)
	... 2 more
Invalid config, exiting abnormally

 S:

 解决办法,就是到 dataDir指定的目录下创建myid文件,将server id(只是数字部分)写到文件中。

dataDir,server id配置在~/conf/zoo.cfg

 

BTW:

如果是配置zookeeper集群,有多台zookeeper server,那么单独启动一台server,会有大量的错误报出。

像这样:

hduser@hadoopnamenode:~/zookeeper-3.3.3$ bin/zkServer.sh start
JMX enabled by default
Using config: /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg
Starting zookeeper ... 
bin/zkServer.sh: 80: bin/zkServer.sh: cannot create /export/crawlspace/mahadev/zookeeper/server1/data
/var/zookeeper//zookeeper_server.pid: Directory nonexistent
STARTED
hduser@hadoopnamenode:~/zookeeper-3.3.3$ 2014-01-05 11:25:22,123 - INFO  [main:QuorumPeerConfig@90] - Reading configuration from: /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg
2014-01-05 11:25:22,134 - INFO  [main:QuorumPeerConfig@310] - Defaulting to majority quorums
2014-01-05 11:25:22,143 - INFO  [main:QuorumPeerMain@119] - Starting quorum peer
2014-01-05 11:25:22,165 - INFO  [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181
2014-01-05 11:25:22,186 - INFO  [main:QuorumPeer@819] - tickTime set to 2000
2014-01-05 11:25:22,190 - INFO  [main:QuorumPeer@830] - minSessionTimeout set to -1
2014-01-05 11:25:22,191 - INFO  [main:QuorumPeer@841] - maxSessionTimeout set to -1
2014-01-05 11:25:22,192 - INFO  [main:QuorumPeer@856] - initLimit set to 5
2014-01-05 11:25:22,216 - INFO  [main:FileSnap@82] - Reading snapshot /var/zookeeper/version-2/snapshot.0
2014-01-05 11:25:22,243 - INFO  [Thread-1:QuorumCnxManager$Listener@473] - My election bind port: 3888
2014-01-05 11:25:22,253 - INFO  [QuorumPeer:/0.0.0.0:2181:QuorumPeer@621] - LOOKING
2014-01-05 11:25:22,255 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id =  1, Proposed zxid = 0
2014-01-05 11:25:22,257 - INFO  [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
2014-01-05 11:25:22,286 - WARN  [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
	at java.lang.Thread.run(Thread.java:662)
2014-01-05 11:25:22,291 - WARN  [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
	at java.lang.Thread.run(Thread.java:662)
2014-01-05 11:25:22,464 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:22,473 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:22,476 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 400
2014-01-05 11:25:22,883 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:22,887 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:22,891 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 800
2014-01-05 11:25:23,694 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:23,704 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:23,705 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 1600
2014-01-05 11:25:25,306 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:25,311 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:25,320 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 3200
2014-01-05 11:25:28,521 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:28,526 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:28,535 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 6400
2014-01-05 11:25:34,937 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:34,942 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:34,944 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 12800
2014-01-05 11:25:47,745 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:47,750 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:25:47,752 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 25600
2014-01-05 11:26:13,359 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:26:13,363 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:26:13,365 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 51200
2014-01-05 11:27:04,568 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:27:04,575 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:27:04,578 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 60000
2014-01-05 11:28:04,579 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:28:04,584 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2014-01-05 11:28:04,587 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 60000

这时,不需要担心,只需要将剩下的server启动即可。

 

 

 

分享到:
评论

相关推荐

    hadoop常见问题及解决方法

    hadoop常见问题及解决方法 Hadoop是大数据处理的重要工具,但是在安装和使用Hadoop时,可能会出现一些常见的问题,这些问题可能会导致Hadoop无法正常工作,或者无法达到预期的性能。下面是Hadoop常见的问题及解决...

    hadoop常见问题及解决办法

    Hadoop常见问题及解决办法汇总 Hadoop是一个基于Apache的开源大数据处理框架,广泛应用于大数据处理、数据分析和机器学习等领域。然而,在使用Hadoop时,经常会遇到一些常见的问题,这些问题可能会导致Hadoop集群...

    hadoop常见问题总结.txt

    Hadoop高可用整套配置与详细问题解决 core-site hdfs.site yarn-site 如:namenode启动失败 DFSZK启动失败 防火墙开关等

    hadoop基本问题.zip

    三、Hadoop常见问题及解决方案 1. 数据倾斜:当某些key过于集中,导致部分reduce节点负载过高。解决方法包括合理设计map和reduce函数,避免数据倾斜,或者使用更高级的计算框架如Spark。 2. 性能瓶颈:可能是网络...

    Hadoop datanode启动失败:Hadoop安装目录权限的问题

    ### Hadoop Datanode启动失败:...- **Hadoop故障排查指南**:参考更多关于Hadoop常见问题及其解决方案的文章和资料。 通过以上步骤和建议,可以有效地解决Hadoop Datanode启动失败的问题,并确保Hadoop集群稳定运行。

    Hadoop使用常见问题以及解决方法

    "Hadoop 使用常见问题以及解决方法" Hadoop 作为一个大数据处理的开源框架,广泛应用于数据存储、处理和分析等领域。但是在使用 Hadoop 时,经常会遇到一些常见的问题,本文将对这些问题进行总结和解决。 Shuffle ...

    Hadoop使用常见问题以及解决方法.doc

    Hadoop使用常见问题以及解决方法.doc Hadoop使用常见问题以及解决方法.doc

    (完整版)hadoop常见笔试题答案.docx

    这个文档包含了Hadoop相关的常见笔试题答案,涵盖了Hadoop的基本概念、架构组件、配置文件以及操作命令等多个方面。 1. Hadoop的核心组成部分是HDFS(Hadoop Distributed File System),它是一个分布式文件系统,...

    Hadoop常见异常

    Hadoop常见异常解决方案 Hadoop是一款大数据处理框架,但是在实际使用过程中,可能会遇到各种异常情况。本文将对Hadoop常见的异常进行总结和分析,并提供相应的解决方案。 一、Cannot replicate to node 0, ...

    常见的hadoop十大应用误解

    综上所述,正确理解和应用Hadoop至关重要,避免这些误解可以帮助企业更好地利用Hadoop解决实际问题,提高数据处理的效率和价值。在实践中,结合业务场景,选择合适的技术栈和工具,才能充分发挥Hadoop在大数据时代的...

    超人学院Hadoop面试葵花宝典

    4. Hadoop常见问题解决:面试题集中提出了一些常见的问题及其解决方法。例如,如果HDFS没有启动成功,可以查看jps确认节点状态。如果发现文件不存在,则需要确认文件路径和权限设置。 5. Hadoop生态工具:Hadoop...

    hadoop开发

    三、Hadoop常见问题及解决方法 1. 数据倾斜:当某些键对应的数据量远大于其他键时,可能导致部分Reducer处理的负载过高。解决方法包括重新设计键的生成逻辑,或者使用更复杂的分区策略。 2. 网络延迟:大规模集群...

    Hadoop学习笔记

    自己整理的hadoop学习笔记,很详尽 很真实。linux操作终端下遇到的各种Hadoop常见问题 解决方案

    hadoop节点问题.docx

    Hadoop集群启动过程中遇到节点未能正常启动的问题是比较常见的。通过检查日志文件来判断节点ID是否一致,并采取相应的措施进行解决是非常重要的步骤。无论是通过清理临时文件并重新格式化集群,还是通过手动修改节点...

    Hadoop学习文档.pdf

    其他知识点还包括了数据采集、运维工具的使用、Hadoop相关资源的收集、Hadoop常见问题解答(FAQ)等。 此外,文档中还提供了运维、Centos下的Oracle安装、基于Gradle的项目构建等实用信息,以及数据备份及恢复、...

    Hadoop大数据常见面试题库

    "Hadoop大数据常见面试题库"通常涵盖了Hadoop生态系统的核心组件、数据处理原理、集群管理和优化等多个方面,这对于求职者或者想要提升自己Hadoop技能的人来说是宝贵的资料。以下是基于这个主题的一些关键知识点: ...

    hadoop学习常见问题(手动整理)

    ### Hadoop学习常见问题解析 #### Namenode问题 ##### cannotdeletenamenodeisinsafemode **问题描述**:当尝试向HDFS系统中放置数据时,可能会遇到`namenodeisinsafemode`的问题,即使使用了`Hadoop dfsadmin -...

    Hadoop 快速入门及常见问题

    这个"Hadoop 快速入门及常见问题"的资料集合可能是为了帮助初学者理解和应用Hadoop系统。 首先,Hadoop的核心组件包括Hadoop Distributed File System (HDFS)和MapReduce。HDFS是分布式文件系统,它可以将大数据...

    Hadoop常见的45个面试题

    作为一个专业的IT大师,我很高兴为你解析Hadoop常见的45个面试题中的关键知识点。由于文件名仅给出面试题的文档,我们无法直接引用具体问题,但我们可以从Hadoop的核心组件、工作原理、应用案例等方面展开讨论,这些...

Global site tag (gtag.js) - Google Analytics