`
杨俊华
  • 浏览: 97400 次
  • 性别: Icon_minigender_1
  • 来自: 南京
社区版块
存档分类
最新评论

记zookeeper 扰动导致HBase的一次不可用

阅读更多
HBase运维过程中,最大的问题除了自己一些bug外,就是网络的延迟。这种延迟会导致hadoop的append的timeout,本来只是一个小事,但是会导致HBase因为无法append WAL log 退出。

而这次遇到的却是zookeeper的问题。

我们的集群里面有3台zookeeper。首先lead(A) 和其中的一台follower B(xx.xx.xx.85)连接出现异常,而这台zookeeper的follower B之后退出。
2011-08-01 03:28:30,013 [LearnerHandler-/xx.xx.xx.85:48270] ERROR org.apache.zookeeper.server.quorum.LearnerHandler: Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	at java.io.DataInputStream.readInt(DataInputStream.java:370)
	at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
	at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
	at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
	at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:375)

2011-08-01 03:28:30,013 [LearnerHandler-/xx.xx.xx.85:48270] WARN org.apache.zookeeper.server.quorum.LearnerHandler: ******* GOODBYE /xx.xx.xx.85:48270 ********


B试图退出,但是退出失败。大量的session连接关闭。

而后,follower c 也出现异常。

2011-08-01 03:29:38,562 [CommitProcessor:0] ERROR org.apache.zookeeper.server.NIOServerCnxn: Unexpected Exception: 
java.nio.channels.CancelledKeyException
	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
	at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
	at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:148)
	at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1043)
	at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1080)
	at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1154)
	at org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:383)
	at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:297)
	at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)


整个过程中,zookeeper和Hbase的session都中断了。导致master遇到fatal的error而退出
2011-08-01 03:29:38,953 [main-EventThread] FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected zk exception getting RS nodes
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /SPN-hbase/rs
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:104)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:307)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:418)
	at org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeChildrenChanged(RegionServerTracker.java:86)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:560)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:536)
2011-08-01 03:29:38,953 [main-EventThread] INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-08-01 03:29:38,954 [main-EventThread] WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:8100-0x230684e82d6738d-0x230684e82d6738d Unable to list children of znode /SPN-hbase/tokenauth/keys 
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /SPN-hbase/tokenauth/keys
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:104)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:307)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:418)
	at org.apache.hadoop.hbase.security.token.ZKSecretWatcher.nodeChildrenChanged(ZKSecretWatcher.java:116)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:560)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:536)
2011-08-01 03:29:38,954 [main-EventThread] ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:8100-0x230684e82d6738d-0x230684e82d6738d Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /SPN-hbase/tokenauth/keys
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:104)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:307)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:418)
	at org.apache.hadoop.hbase.security.token.ZKSecretWatcher.nodeChildrenChanged(ZKSecretWatcher.java:116)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:560)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:536)
2011-08-01 03:29:38,954 [main-EventThread] ERROR org.apache.hadoop.hbase.security.token.ZKSecretWatcher: Error reading data from zookeeper
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /SPN-hbase/tokenauth/keys
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:104)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:307)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:418)
	at org.apache.hadoop.hbase.security.token.ZKSecretWatcher.nodeChildrenChanged(ZKSecretWatcher.java:116)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:560)
	at 


由于我们还有backup master,然而,backup master因为zookeeper的缘故也无法正常工作。

之后,大量的regionserver down。

2011-08-01 03:29:38,565 [ZKSecretWatcher-leaderElector] INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unexpected error from ZK, stopping candidate
2011-08-01 03:29:38,565 [ZKSecretWatcher-leaderElector] INFO org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager: Stopping leader election, because: Unexpected error from ZK: KeeperErrorCode = InvalidACL for /SPN-hbase/tokenauth/keymaster

整个过程中,我们看到zookeeper的一次异常对HBase的致命打击。

现在,我们只能在regionserver和zookeeper上面加watchdog,对down的server快速重启来避免这种问题的发生。

HBase也意识到这个问题。

https://issues.apache.org/jira/browse/HBASE-3065

试图在zookeeper扰动的过程中尽量保证HBase的运行。增加了更多的retry
0
1
分享到:
评论
1 楼 27g 2011-10-28  
11/10/28 15:40:06 INFO mapred.JobClient: Task Id : attempt_201110111715_0099_m_000003_0, Status : FAILED
Error: TOKENIZED

您好,在我将jar包在hadoop环境上跑的时候,包了以上错误(出现很多行同样的),请问这可能是什么情况啊?

相关推荐

    Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建

    为搭建Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境,需要至少3台Linux机器,建议使用Centos6.4 64位操作系统,每台机器建议配置不低于4G内存和10G磁盘空间。 软件方面,需要安装jdk-7u55-linux-x64.rpm、apache-...

    hbase配置内置的zookeeper

    ### HBase 配置内置 ZooKeeper 的详细步骤与解析 #### 一、配置背景与目的 在 HBase 的部署环境中,ZooKeeper 起着非常重要的作用,它主要用于协调集群中的各个节点,并且管理 HBase 的元数据。通常情况下,HBase ...

    Hadoop HA高可用集群搭建(Hadoop+Zookeeper+HBase)

    Hadoop HA高可用集群搭建(Hadoop+Zookeeper+HBase) 一、Hadoop HA高可用集群...通过规划服务器环境、选择合适的版本组合、安装和配置Zookeeper、Hadoop和HBase三个组件,可以搭建一个高效的Hadoop HA高可用集群。

    zookeeper3.4.12+hbase1.4.4+sqoop1.4.7+kafka2.10

    在构建大数据处理环境时,Hadoop集群是核心基础,而`zookeeper3.4.12+hbase1.4.4+sqoop1.4.7+kafka2.10`这一组合则提供了集群中不可或缺的组件。让我们逐一探讨这些组件的功能、作用以及它们之间的协同工作。 **...

    zookeeper及hbase安装

    ### Zookeeper及HBase安装配置详解 #### 一、Zookeeper安装配置 **目标环境**: - 三台服务器:192.168.15.5(master)、192.168.15.6(slave1)、192.168.15.7(slave2) **步骤一:主机名映射** - 在每台服务器...

    Hbase与zookeeper笔记备份.rar

    在大数据领域,Hbase和Zookeeper是两个至关重要的组件,它们在分布式系统中的作用不可忽视。本篇笔记主要围绕这两个技术进行深入探讨,结合尚硅谷的视频教程资源,旨在帮助读者全面理解并掌握这两者的核心知识。 一...

    hbase分布式安装(zookeeper3.4.8+hbase2.1.0).pdf

    hbase分布式安装(zookeeper3.4.8+hbase2.1.0).pdf

    hadoop-1.2.1 + zooKeeper3.4.5 + hbase-0.94集群安装部署

    企业内部实际 hadoop zookeeper hbase搭建步骤明细

    hadoop+hbase+zookeeper集群配置流程及文件

    在大数据处理领域,Hadoop、HBase和Zookeeper是三个至关重要的组件,它们共同构建了一个高效、可扩展的数据处理和存储环境。以下是关于这些技术及其集群配置的详细知识。 首先,Hadoop是一个开源的分布式计算框架,...

    zookeeper+hbase集群搭建

    在本文中,我们将深入探讨如何搭建一个Zookeeper和HBase集群,以及在过程中可能遇到的常见问题和解决方案。Zookeeper和HBase都是大数据处理领域的关键组件,Zookeeper作为一个分布式协调服务,而HBase是一个基于...

    hadoop-2.8.1 zookeeper-3.4.9 hbase-1.3.1分布式环境搭建整理

    hadoop-2.8.1 zookeeper-3.4.9 hbase-1.3.1分布式环境搭建整理

    Hadoop2.6+HA+Zookeeper3.4.6+Hbase1.0.0 集群安装详细步骤

    Hadoop2.6+HA+Zookeeper3.4.6+Hbase1.0.0 集群安装详细步骤

    Hadoop+Zookeeper+Hbase+Hive部署.doc

    本文档主要介绍如何部署 Hadoop、Zookeeper、Hbase、Hive 等大数据处理技术,构建一个完整的大数据平台。 Hadoop 部署 Hadoop 是一个开源的大数据处理框架,由 Apache 开发。Hadoop 可以对大规模数据进行分布式...

    Hadoop Zookeeper HBase集群

    总结起来,"Hadoop Zookeeper HBase集群"是大数据处理的一种架构,通过Hadoop进行数据存储和处理,利用Zookeeper进行集群管理和协调,以及借助HBase实现大规模数据的实时查询。理解和熟练掌握这三个组件的原理和使用...

    Zookeeper和Hbase安装总结手册.

    Zookeeper和Hbase安装总结手册.

    hadoop+zookeeper+hbase集群搭建配置说明

    在大数据处理领域,Hadoop、Zookeeper和HBase是三个非常关键的组件,它们共同构建了一个高效、可扩展的数据仓库集群。以下是对这三个组件及其在集群搭建中的配置说明的详细阐述。 1. Hadoop:Hadoop是Apache软件...

    hbase和zookeeper配置

    HBase 和 ZooKeeper 配置详解 HBase 和 ZooKeeper 是两个非常重要的分布式系统组件,分别用于分布式数据库和分布式协调服务。今天,我们将详细介绍如何配置 HBase 和 ZooKeeper,以便更好地理解它们之间的交互。 ...

    VMware10+CentOS6.5+Hadoop2.2+Zookeeper3.4.6+HBase0.96安装过程详解

    VMware10+CentOS6.5+Hadoop2.2+Zookeeper3.4.6+HBase0.96安装过程详解 用于解决分布式集群服务器

    hadoop、hbase、hive、zookeeper版本对应关系续(最新版)(2015)

    例如,Hadoop 1.2配合HBase 0.95.0和Hive 0.11.0可能会导致不兼容,表现为创建Hive和HBase关联表时出现pair对异常。但Hadoop 1.2搭配HBase 0.94.9和Hive 0.10.0则可以正常工作。兼容性方面,文章提到hadoop-1.0.3、...

    Linux下Hbase和zookeeper的安装和部署

    ### Linux下Hbase和Zookeeper的安装与部署详解 #### Zookeeper 安装与配置 Zookeeper 是一款开源的分布式协调服务系统,主要用于解决分布式应用中的常见问题,例如:命名服务、状态同步服务、集群管理等。在进行...

Global site tag (gtag.js) - Google Analytics