- 浏览: 4407223 次
- 性别:
- 来自: 厦门
文章分类
- 全部博客 (634)
- Oracle日常管理 (142)
- Oracle体系架构 (45)
- Oracle Tuning (52)
- Oracle故障诊断 (35)
- RAC/DG/OGG (64)
- Oracle11g New Features (48)
- DataWarehouse (15)
- SQL, PL/SQL (14)
- DB2日常管理 (9)
- Weblogic (11)
- Shell (19)
- AIX (12)
- Linux/Unix高可用性 (11)
- Linux/Unix日常管理 (66)
- Linux桌面应用 (37)
- Windows (2)
- 生活和工作 (13)
- 私人记事 (0)
- Python (9)
- CBO (15)
- Cognos (2)
- ORACLE 12c New Feature (2)
- PL/SQL (2)
- SQL (1)
- C++ (2)
- Hadoop大数据 (5)
- 机器学习 (3)
- 非技术 (1)
最新评论
-
di1984HIT:
xuexilee!!!
Oracle 11g R2 RAC高可用连接特性 – SCAN详解 -
aneyes123:
谢谢非常有用那
PL/SQL的存储过程和函数(原创) -
jcjcjc:
写的很详细
Oracle中Hint深入理解(原创) -
di1984HIT:
学习了,学习了
Linux NTP配置详解 (Network Time Protocol) -
avalonzst:
大写的赞..
AIX内存概述(原创)
CSS工作原理
在理解脑裂(Brain Split)处理过程前,有必要介绍一下Oracle RAC Css(Cluster Synchronization Services)的工作框架:
Oracle RAC CSS提供2种后台服务包括群组管理(Group Managment简称GM)和节点监控(Node Monitor简称NM),其中GM管理组(group)和锁(lock)服务。在集群中任意时刻总有一个节点会充当GM主控节点(master node)。集群中的其他节点串行地将GM请求发送到主控节点(master node),而master node将集群成员变更信息广播给集群中的其他节点。组成员关系(group membership)在每次发生集群重置(cluster reconfiguration)时发生同步。每一个节点独立地诠释集群成员变化信息。
而节点监控NM服务则负责通过skgxn(skgxn-libskgxn.a,提供节点监控的库)与其他厂商的集群软件保持节点信息的一致性。此外NM还提供对我们熟知的网络心跳(Network heartbeat)和磁盘心跳(Disk heartbeat)的维护以保证节点始终存活着。当集群成员没有正常Network heartbeat或Disk heartbeat时NM负责将成员踢出集群,被踢出集群的节点将发生节点重启(reboot)。
NM服务通过OCR中的记录(OCR中记录了Interconnect的信息)来了解其所需要监听和交互的端点,将心跳信息通过网络发送到其他集群成员。同时它也监控来自所有其他集群成员的网络心跳Network heartbeat,每一秒钟都会发生这样的网络心跳,若某个节点的网络心跳在misscount(by the way:10.2.0.1中Linux上默认misscount为60s,其他平台为30s,若使用了第三方vendor clusterware则为600s,但10.2.0.1中未引入disktimeout;10.2.0.4以后misscount为60s,disktimeout为200s;11.2以后misscount为30s:CRS-4678: Successful get misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services)指定的秒数中都没有被收到的话,该节点被认为已经”死亡”了。NM还负责当其他节点加入或离开集群时初始化集群的重置(Initiates cluster reconfiguration)。
在解决脑裂的场景中,NM还会监控voting disk以了解其他的竞争子集群(subclusters)。关于子集群我们有必要介绍一下,试想我们的环境中存在大量的节点,以Oracle官方构建过的128个节点的环境为我们的想象空间,当网络故障发生时存在多种的可能性,一种可能性是全局的网络失败,即128个节点中每个节点都不能互相发生网络心跳,此时会产生多达128个的信息”孤岛”子集群。另一种可能性是局部的网络失败,128个节点中被分成多个部分,每个部分中包含多于一个的节点,这些部分就可以被称作子集群(subclusters)。当出现网络故障时子集群内部的多个节点仍能互相通信传输投票信息(vote mesg),但子集群或者孤岛节点之间已经无法通过常规的Interconnect网络交流了,这个时候NM Reconfiguration就需要用到voting disk投票磁盘。
Voting Disk
因为NM要使用voting disk来解决因为网络故障造成的通信障碍,所以需要保证voting disk在任意时刻都可以被正常访问。在正常状态下,每个节点都会进行磁盘心跳活动,具体来说就是会到投票磁盘的某个块上写入disk心跳信息,这种活动每一秒钟都会发生,同时CSS还会每秒读取一种称作”kill block”的”赐死块”,当”kill block”的内容表示本节点被驱逐出集群时,CSS会主动重启节点。
为了保证以上的磁盘心跳和读取”kill block”的活动始终正常运作CSS要求保证至少(N/2+1)个投票磁盘要被节点正常访问,这样就保证了每2个节点间总是至少有一个投票磁盘是它们都可以正常访问的,在正常情况下(注意是风平浪静的正常情况)只要节点所能访问的在线voting disk多于无法访问的voting disk,该节点都能幸福地活下去,当无法访问的voting disk多于正常的voting disk时,Cluster Communication Service进程将失败并引起节点重启。所以有一种说法认为voting disk只要有2个足以保证冗余度就可以了,没有必要有3个或以上voting disk,这种说法是错误的。Oracle推荐集群中至少要有3个voting disks。
Question:
有同学问那么voting disk 必须是奇数个呢?
Answer:
实际上我们仅仅是推荐使用奇数个vote disk ,而非必须是奇数个。10gR2中vote disk的数目上限是32个。
Question
我们可以使用2或4个vote disk吗?
Answer:
可以的。 但是2、4这样的数目在“至少(N/2+1)个投票磁盘要被节点正常访问”这一disk heartbeat的硬性算法下是不利的:
当我们使用2个vote disk 时,不能发生任意个vote disk的心跳失败
当我们使用3个vote disk 时,不能发生大于1个的vote disk心跳失败
当我们使用4个vote disk 时,不能发生大于1个的vote disk心跳失败 ,这和3个时的容错率是一样,但是因为我们有更多的vote disk,这会导致管理成本和引入的风险增长
当我们使用5个vote disk 时,不能发生大于2个的vote disk心跳失败
当我们使用6个vote disk 时,仍然不能发生大于2个的vote disk心跳失败, 同样的因为比5时多出一个 ,也会引入不合理的管理成本和风险
Question:
若节点间的网络心跳正常,且节点所能正常心跳的vote disk 大于不能正常访问的 ,如3个votedisk 时恰巧有1个vote disk 的disk heartbeat 超时,此时Brain split 会发生吗?
Answer:
这种情况即不会触发Brain Split,也不会引发节点驱逐协议(eviction protocol)。 当单个或小于(N/2+1)个的voting disk心跳失败(disk heartbeat failure)时,这种心跳失败可能是由于短期内节点访问voting disk发生I/O error错误而引起的,此时css会立刻将这些失败的voting disk标记为OFFLINE。虽然有一定数量的voting disk OFFLINE了,但是我们仍有至少(N/2+1)个投票磁盘可用,这保证了eviction protocol不会被调用,所以没有节点会被reboot重启。紧接着node monitor模块的Disk ping Monitor Thread(DPMT-clssnmDiskPMT)会重复尝试访问这些失败的OFFLINE voting disk,若这些投票磁盘变得再次可I/O访问且经过验证其上的数据也没有讹误,那么css会再次将此voting disk标记为ONLINE;但是如果在45s( 这里的45s是基于misscount和 内部算法获得的) 内仍不能正常访问相关的voting disk,那么DMPT将在cssd.log中生成警告信息,如:
CSSD]2011-11-11 20:11:20.668 >
WARNING: clssnmDiskPMT: long disk latency >(45940 ms) to voting disk (0//dev/asm-votedisk1)
假设以上发生clssnmDiskPMT警告的RAC场景共有3个voting disk,现已有一个asm-votedisk1因为I/O error或其他原因而被标记为OFFLINE,若此时再有一个votedisk也出现了问题并disk heartbeat 失败,那么节点会因为少于规定数目(2)的votedisk而引发eviction protocol,进而重启reboot。
单个或小于(N/2+1)个的voting disk心跳失败都仅仅生成警告(Warning),而非致命的错误。因为仍有绝大数量的vote disk可被访问,因此生成的警告都是非致命的,eviction protocol将不会被触发。
当实际的NM Reconfiguration集群重置情况发生时所有的active节点和正在加入集群的节点都会参与到reconfig中,那些没有应答(ack)的节点都将不再被归入新的集群关系中。实际上reconfig重置包括多个阶段:
1.初始化阶段 — reconfig manager(由集群成员号最低的节点担任)向其他节点发送启动reconfig的信号
2.投票阶段 — 节点向reconfig manager发送该节点所了解的成员关系
3.脑裂检查阶段 — reconfig manager检查是否脑裂
4.驱逐阶段 — reconfig manager驱逐非成员节点
5.更新阶段 — reconfig manager向成员节点发送权威成员关系信息
在脑裂检查阶段Reconfig Manager会找出那些没有Network Heartbeat而有Disk Heartbeat的节点,并通过Network Heartbeat(如果可能的话)和Disk Heartbeat的信息来计算所有竞争子集群(subcluster)内的节点数目,并依据以下2种因素决定哪个子集群应当存活下去:
拥有最多节点数目的子集群(Sub-cluster with largest number of Nodes)
若子集群内数目相等则为拥有最低节点号的子集群(Sub-cluster with lowest node number),举例来说在一个2节点的RAC环境中总是1号节点会获胜。
采用Stonith algorithm 的IO fencing(remote power reset)
STONITH算法是一种常用的I/O Fencing algorithm,是RAC中必要的远程关闭节点的接口。其想法十分简单,当某个节点上运行的软件希望确保本集群内的其他节点不能使用某种资源时,拔出其他节点的插座即可。这是一种简单、可靠且有些残酷的算法。Stonith 的优势是其没有特定的硬件需求,也不会限制集群的扩展性。
Oracle Clusterware的Process Monitor模块负责实现IO fencing,保证不会因节点/实例的不协调工作而产生corruption。Process Monitor的工作具体由hangcheck timer或者oprocd 完成, 在Linux平台上10.2.0.4 之前都没有oprocd的存在(其他Unix平台在10.2.0.1就有了),在安装RAC之前需要额外安装hangcheck timer软件以保证IO fencing, 到10.2.0.4 时Linux上也有了oprocd,具体见<Know about RAC Clusterware Process OPROCD> 一文。 这些负责IO fencing的进程一般都会被锁定在内存中(locked in memory)、实时运行(Real time)、休眠固定的时间(Sleep a fixed time)、以root身份运行;若进程唤醒后发现时间已经太晚,那么它会强制reboot;若这些进程发生失败,则也会重启,所以在RAC环境中oprocd是一个很重要的进程,不要失去手动去kill这个进程。
在完成脑裂检查后进入驱逐阶段,被驱逐节点会收到发送给它们的驱逐信息(如果网络可用的话),若无法发送信息则会通过写出驱逐通知到voting disk上的”kill block”来达到驱逐通知的目的。同时还会等待被驱逐节点表示其已收到驱逐通知,这种表示可能是通过网络通信的方式也可能是投票磁盘上的状态信息。
可以看到Oracle CSS中Brain Split Check时会尽可能地保证最大的一个子集群存活下来以保证RAC系统具有最高的可用性。
实际案例日志
1号节点网络失败,2,3号节点形成子集群;2,3节点通过voting disk向1号节点发起驱逐:
以下为1号节点的ocssd.log日志:
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 29.610 seconds
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1037, misstime 30390
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 3.280308e-268artbeat fatal, eviction in 29.150 seconds
对2,3号节点发起misscount计时
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30850
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:11:44.368 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 28.610 seconds
[ CSSD]2011-04-23 17:12:04.778 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 14.580 seconds
[ CSSD]2011-04-23 17:12:04.779 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 3.280308e-268artbeat fatal, eviction in 14.120 seconds
[ CSSD]2011-04-23 17:12:06.207 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 13.580 seconds
[ CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 5.560 seconds
[ CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 5.100 seconds
[ CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 4.560 seconds
[ CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 4.100 seconds
[ CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 3.560 seconds
[ CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 3.100 seconds
[ CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 2.560 seconds
[ CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 2.100 seconds
[ CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 1.550 seconds
[ CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 1.090 seconds
2号节点的ocssd.log日志:
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 29.800 seconds
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE: clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30200
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:11:54.516 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 28.790 seconds
[ CSSD]2011-04-23 17:12:14.826 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 14.800 seconds
[ CSSD]2011-04-23 17:12:16.265 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 13.800 seconds
[ CSSD]2011-04-23 17:12:27.755 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 5.800 seconds
[ CSSD]2011-04-23 17:12:29.197 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 4.800 seconds
[ CSSD]2011-04-23 17:12:30.658 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 3.800 seconds
[ CSSD]2011-04-23 17:12:32.133 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 2.800 seconds
[ CSSD]2011-04-23 17:12:33.602 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 1.790 seconds
[ CSSD]2011-04-23 17:12:35.126 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 0.800 seconds
[ CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]
clssnmHandleSyn应答3号节点发送的同步信息
[ CSSD]2011-04-23 17:12:35.399 [5073104] >USER: NMEVENT_SUSPEND [00][00][00][0e]
发生Node Monitoring SUSPEND事件
[ CSSD]2011-04-23 17:12:35.405 [117574544] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(10)
通过clssnmSendVoteInfo向3号节点发送投票信息Vote mesg
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 1, state (3/0) unique (1303592601/1303592601) prevConuni(0) birth (9/9) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7e80ae8), probe((nil))
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmDeactivateNode: node 1 (vrh1) left cluster
确认1号节点离开了集群cluster
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >USER: clssnmHandleUpdate: SYNC(10) from node(3) completed
[ CSSD]2011-04-23 17:12:35.416 [117574544] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:12:35.416 [117574544] >USER: clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:12:35.416 [117574544] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2011-04-23 17:12:35.416 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (10)
[ CSSD]2011-04-23 17:12:35.416 [3021970320] >USER: NMEVENT_RECONFIG [00][00][00][0c]
[ CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2
[ CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9)
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9)
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 10
[ CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE: clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa
[ CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE: clssgmPeerListener: connects done (2/2)
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2011-04-23 17:12:35.428 [3021970320] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes
[ CSSD]CLSS-3001: local node number 2, master node number 2
完成reconfiguration
[ CSSD]2011-04-23 17:12:35.440 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(10), with status(1)
以下为3号节点的ocssd.log:
[ CSSD]2011-04-23 17:12:36.303 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 1.867300e-268artbeat fatal, eviction in 29.220 seconds
[ CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE: clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30780
[ CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:12:57.889 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 1.867300e-268artbeat fatal, eviction in 14.220 seconds
[ CSSD]2011-04-23 17:13:10.674 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 5.220 seconds
[ CSSD]2011-04-23 17:13:12.115 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 4.220 seconds
[ CSSD]2011-04-23 17:13:13.597 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 3.210 seconds
[ CSSD]2011-04-23 17:13:15.024 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 2.220 seconds
[ CSSD]2011-04-23 17:13:16.504 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 1.220 seconds
[ CSSD]2011-04-23 17:13:17.987 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 0.220 seconds
[ CSSD]2011-04-23 17:13:18.325 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh1 (1), flags 0x040d, state 3, wt4c 0
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 10
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ALIVE
[ CSSD]2011-04-23 17:13:18.327 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(10)
[ CSSD]2011-04-23 17:13:18.329 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(2)
[ CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]
[ CSSD]2011-04-23 17:13:18.330 [8136912] >USER: NMEVENT_SUSPEND [00][00][00][0e]
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 1, vrh1, misstime(60010) state(5)
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE
[ CSSD]2011-04-23 17:13:18.334 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(2)
[ CSSD]2011-04-23 17:13:18.335 [89033616] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(10)
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(13)
以上完成了2-3节点间的Vote mesg通信,这些信息包含Node identifier,GM peer to peer listening endpoint以及
View of cluster membership。
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info...
开始检测voting disk上的信息
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: node 1, vrh1, state 5 with leader 1 has smaller cluster size 1; my cluster size 2 with leader 2
发现其他子集群,包含1号节点且1号节点为该子集群的leader,为最小子集群;3号与2号节点组成最大子集群,2号节点为leader节点
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Start
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Evicting node 1, vrh1, birth 9, death 10,
impendingrcfg 1, stateflags 0x40d
发起对1号节点的驱逐
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmSendShutdown: req to node 1, kill time 443294
[ CSSD]2011-04-23 17:13:18.339 [3032460176] >TRACE: clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7eaf220), probe((nil))
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmWaitOnEvictions: node 1, vrh1, undead 1
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmCheckKillStatus: Node 1, vrh1, down, LATS(443144),timeout(150)
clssnmCheckKillStatus检查1号节点是否down了
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (15)
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSendUpdate: syncSeqNo(10)
[ CSSD]2011-04-23 17:13:18.341 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(2)
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 1, state (5/0) unique (1303592601/1303592601) prevConuni(1303592601) birth (9/9) (old/new)
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmDeactivateNode: node 1 (vrh1) left cluster
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)
[ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: SYNC(10) from node(3) completed
[ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:13:18.342 [89033616] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2011-04-23 17:13:18.347 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2011-04-23 17:13:18.348 [3032460176] >TRACE: clssnmDoSyncUpdate: Sync 10 complete!
[ CSSD]2011-04-23 17:13:18.350 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (10)
[ CSSD]2011-04-23 17:13:18.350 [3021970320] >USER: NMEVENT_RECONFIG [00][00][00][0c]
[ CSSD]2011-04-23 17:13:18.351 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2
[ CSSD]2011-04-23 17:13:18.352 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)
[ CSSD]2011-04-23 17:13:18.353 [3063929744] >TRACE: clssgmDispatchCMXMSG(): got message type(7) src(2) incarn(10) during incarn(9/9)
[ CSSD]2011-04-23 17:13:18.354 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
.........................省略若干行.........................
[ CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)
[ CSSD]2011-04-23 17:13:18.357 [3021970320] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 10
[ CSSD]2011-04-23 17:13:18.366 [3063929744] >TRACE: clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa
[ CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(68) incarn 10
[ CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE: clssgmPeerListener: connects done (2/2)
[ CSSD]2011-04-23 17:13:18.369 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)
更新阶段
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes
[ CSSD]CLSS-3001: local node number 3, master node number 2
[ CSSD]2011-04-23 17:13:18.372 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(10), with status(1)
另一场景为1号节点未加入集群,2号节点的网络失败,因2号节点的member number较小故其通过voting disk向3号节点发起驱逐
以下为2号节点的ocssd.log日志
[ CSSD]2011-04-23 17:41:48.643 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 29.890 seconds
[ CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE: clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30110
[ CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:41:50.132 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 28.890 seconds
[ CSSD]2011-04-23 17:42:10.533 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 14.860 seconds
[ CSSD]2011-04-23 17:42:11.962 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 13.860 seconds
[ CSSD]2011-04-23 17:42:23.523 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 5.840 seconds
[ CSSD]2011-04-23 17:42:24.989 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 4.840 seconds
[ CSSD]2011-04-23 17:42:26.423 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 3.840 seconds
[ CSSD]2011-04-23 17:42:27.890 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 2.840 seconds
[ CSSD]2011-04-23 17:42:29.382 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 1.840 seconds
[ CSSD]2011-04-23 17:42:30.832 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 0.830 seconds
[ CSSD]2011-04-23 17:42:32.020 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh3 (3), flags 0x040d, state 3, wt4c 0
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 13
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(13)
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1)
[ CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[13] sync[13]
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 3, vrh3, misstime(60000) state(5)
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2011-04-23 17:42:32.021 [5073104] >USER: NMEVENT_SUSPEND [00][00][00][0c]
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
[ CSSD]2011-04-23 17:42:32.022 [117574544] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(13)
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(13)
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: node 3, vrh3, state 5 with leader 3 has smaller cluster size 1; my cluster size 1 with leader 2
检查voting disk后发现子集群3为最小"子集群"(3号节点的node number较2号大);2号节点为最大子集群
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Start [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Evicting node 3, vrh3, birth 3, death 13, impendingrcfg 1, stateflags 0x40d
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmSendShutdown: req to node 3, kill time 1643084
发起对3号节点的驱逐和shutdown request
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmDiscHelper: vrh3, node(3) connection failed, con (0xb7e79bb0), probe((nil))
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmWaitOnEvictions: node 3, vrh3, undead 1
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmCheckKillStatus: Node 3, vrh3, down, LATS(1642874),timeout(210)
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (15)
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSendUpdate: syncSeqNo(13)
[ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 3, state (5/0) unique (1303591326/1303591326) prevConuni(1303591326) birth (3/3) (old/new)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmDeactivateNode: node 3 (vrh3) left cluster
[ CSSD]2011-04-23 17:42:32.024 [117574544] >USER: clssnmHandleUpdate: SYNC(13) from node(2) completed
[ CSSD]2011-04-23 17:42:32.024 [117574544] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmDoSyncUpdate: Sync 13 complete!
[ CSSD]2011-04-23 17:42:32.024 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (13)
[ CSSD]2011-04-23 17:42:32.024 [3021970320] >USER: NMEVENT_RECONFIG [00][00][00][04]
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(crs_version) birth(3/3)
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
................省略若干行..............
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(#CSS_CLSSOMON) birth(3/3)
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 13
[ CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE: clssgmPeerDeactivate: node 3 (vrh3), death 13, state 0x0 connstate 0xf
[ CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE: clssgmPeerListener: connects done (1/1)
[ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 13 is node(2) birth(2)
[ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 13 with 1 nodes
[ CSSD]CLSS-3001: local node number 2, master node number 2
完成reconfiguration
[ CSSD]2011-04-23 17:42:32.027 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(13), with status(1)
以下为3号节点的ocssd.log日志:
[ CSSD]2011-04-23 17:42:33.204 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 1.867300e-268artbeat fatal, eviction in 29.360 seconds
[ CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE: clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1039, misstime 30640
[ CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:42:55.168 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 1.867300e-268artbeat fatal, eviction in 14.330 seconds
[ CSSD]2011-04-23 17:43:08.182 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 5.310 seconds
[ CSSD]2011-04-23 17:43:09.661 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 4.300 seconds
[ CSSD]2011-04-23 17:43:11.144 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 3.300 seconds
[ CSSD]2011-04-23 17:43:12.634 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 2.300 seconds
[ CSSD]2011-04-23 17:43:14.053 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 1.300 seconds
[ CSSD]2011-04-23 17:43:15.467 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 0.300 seconds
[ CSSD]2011-04-23 17:43:15.911 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh2 (2), flags 0x040f, state 3, wt4c 0
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 13
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ALIVE
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(13)
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1)
[ CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[29] sync[13]
[ CSSD]2011-04-23 17:43:15.912 [8136912] >USER: NMEVENT_SUSPEND [00][00][00][0c]
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 2, vrh2, misstime(60010) state(5)
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE
[ CSSD]2011-04-23 17:43:15.913 [89033616] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(13)
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssnmCheckDskInfo: Aborting local node to avoid splitbrain. [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: : my node(3), Leader(3), Size(1) VS Node(2), Leader(2), Size(1)
读取voting disk后发现kill block,为避免split brain,自我aborting!
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: ###################################
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssscExit: CSSD aborting
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: ###################################
[ CSSD]--- DUMP GROCK STATE DB ---
[ CSSD]----------
[ CSSD] type 2, Id 4, Name = (crs_version)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 2
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD] . . . . .
[ CSSD] memberNo =1, seq 12
[ CSSD] flags = 0x1000, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 2
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 3, Id 11, Name = (_ORA_CRS_FAILOVER)
[ CSSD] flags: 0x0
[ CSSD] grant: count=1, type 3, wait 1
[ CSSD] Member Count =1, master -3
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 0
[ CSSD] flags = 0x12, granted 1
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 2, Name = (EVMDMAIN)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 1
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 2
[ CSSD] privateDataSize = 508
[ CSSD] publicDataSize = 504
[ CSSD] . . . . .
[ CSSD] memberNo =3, seq 2
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 508
[ CSSD] publicDataSize = 504
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 5, Name = (CRSDMAIN)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =1, master 3
[ CSSD] . . . . .
[ CSSD] memberNo =3, seq 2
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 128
[ CSSD] publicDataSize = 128
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh1)
[ CSSD] flags: 0x0
[ CSSD] grant: count=1, type 3, wait 1
[ CSSD] Member Count =1, master -3
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 0
[ CSSD] flags = 0x12, granted 1
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh3)
[ CSSD] flags: 0x0
[ CSSD] grant: count=1, type 3, wait 1
[ CSSD] Member Count =1, master -3
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 0
[ CSSD] flags = 0x12, granted 1
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 3, Name = (ocr_crs)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 3
[ CSSD] . . . . .
[ CSSD] memberNo =3, seq 2
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 32
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 12
[ CSSD] flags = 0x1000, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 2
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 32
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 1, Name = (#CSS_CLSSOMON)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 1
[ CSSD] flags = 0x1000, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 2
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD] . . . . .
[ CSSD] memberNo =3, seq 2
[ CSSD] flags = 0x1000, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]--- END OF GROCK STATE DUMP ---
[ CSSD]------- Begin Dump -------
补充官方文档
The RM (Reconfig Manager) sends a sync message to all participating nodes. Participating nodes respond with a sync acknowledgement. After this the vote phase begins and the master sends a vote message to all participating nodes. Participating nodes repond with a vote info message containing their node identifier and GM peer to peer listening endpoint. In the split-check phase, the RM uses the voting disk to verify there is no split-brain. It finds nodes heartbeating to disk that are not connected via the network. If it finds these, it will determine which nodes are talking to which and the largest subcluster survives. For example, if we have a 5 node cluster and all of the nodes are heartbeating to the voting disk but only a group of 3 can communicate via the network and a group of 2 can communication via the network, this means we have 2 subclusters. The largest subcluster (3) would survive while the other subcluster (2) would not. After this the evict phase would evict nodes previously in the cluster but not considered members in this incarnation. In this case we would send a message to evicted nodes (if possible) and write eviction notice to a ‘kill’ block in the voting file. We would wait for the node to indicate it got the eviction notice (wait for seconds). The wait is terminated by a message or status on the voting file indicating that the node got the eviction notice. In the update phase the master sends an update message containing the definitive cluster membership and node information for all particpating nodes. The participating nodes send update acknowledgements. All members queue the reconfiguration event ot their GM.
As far as voting disks are concerned, a node must be able to access strictly more than half of the voting disks at any time. So if you want to be able to tolerate a failure of n voting disks, you must have at least 2n+1 configured. (n=1 means 3 voting disks). You can configure up to 32 voting disks, providing protection against 15 simultaneous disk failures.
Oracle recommends that customers use 3 or more voting disks in Oracle RAC 10g Release 2. Note: For best availability, the 3 voting files should be physically separate disks. It is recommended to use an odd number as 4 disks will not be any more highly available than 3 disks, 1/2 of 3 is 1.5…rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our cluster will fail with both 4 voting disks or 3 voting disks.
When interconnect breaks – keeps the largest cluster possible up, other nodes will be evicted, in 2 node cluster lowest number node remains.
Node eviction: pick a cluster node as victim to reboot.Always keep the largest cluster possible up, evicted other nodes two nodes: keep the lowest number node up and evict other
参考至:http://blog.csdn.net/liu_maclean/article/details/7171511
如有错误,欢迎指正
邮箱:czmcj@163.com
发表评论
-
Duplicate a Database Using RMAN in Oracle Database 11g Release 2
2016-11-03 15:38 981Duplicate a Database Using RMA ... -
grid control 安装总结 (三)之EMD upload error: uploadXMLFiles skipped或忘记OMS密码
2014-11-17 17:16 2826[oracle@anbob agent10g]$ bin/e ... -
Grid Control服务端看不到Agent相关信息,同时OMS状态为unknown处理
2014-11-17 17:14 2199一次Grid Control的agent是启动的,但是在GC ... -
给11gR2 RAC添加LISTENER监听器并静态注册
2014-10-22 17:52 3228之前有同学想要给11gR2的RAC添加LISTENER监听器 ... -
HOW TO REMOVE CRS AUTO START AND RESTART FOR A RAC INSTANCE
2014-10-12 20:08 1509Subject: HOW TO REMOVE CRS AU ... -
How To Enable Archive Logging In RAC Environment [ID 1186764.1]
2014-10-12 19:45 1045How To Enable Archive Logging ... -
Oracle 11g RAC使用Manual和Policy Managed方法配置
2014-10-12 17:25 8925软件环境: 操作系统:Red Hat Enterpris ... -
Moving OCR and Voting Disk to Oracle ASM Online
2014-10-12 16:35 2696Prior Oracle 11gR2, it was only ... -
Oracle 11g R2的启动与关闭(原创)
2014-10-10 15:04 27243概述 Oracle 11gR2 中对CRSD资源进行了重新 ... -
重新配置与卸载11gR2 Grid Infrastructure
2014-10-10 08:24 2338概述Oracle 11g R2 Grid Infrastru ... -
Data Guard Broker Properties
2014-10-07 16:15 1028Data Guard Broker Properties ... -
Data Guard - Snapshot Standby Database配置
2014-10-07 07:39 987概述一般情况下,物理standby数据库处于mount状态 ... -
Active Data Guard
2014-10-06 19:27 908Oracle Active Data Guard is a ... -
Data Guard Broker系列之六:Fast-Start Failover
2014-10-06 18:29 3114Fast-Start Failover 基本概念 Fa ... -
Data Guard Broker系列之五:数据库角色转换
2014-10-06 18:28 1298数据库转换的基本概 ... -
Data Guard Broker系列之四:数据库管理
2014-10-06 18:27 1242数据库状态管理 数据 ... -
Data Guard Broker系列之三:Data Guard Broker配置管理
2014-10-06 16:56 1298配置data guard broker配置文件基本参数 ... -
Data Guard Broker系列之二:Data Guard Broker配置实战
2014-10-05 16:16 1617配置之前DG环境状态 测试的DG环境安装在同一个机器上, ... -
Data Guard broker系列之一:Data Guard Broker基础
2014-10-05 16:13 2323Data Guard broker概览 Data ... -
Dataguard切换保护模式(从MAXIMUM PERFORMANCE 到 maximize availability)
2014-10-05 14:52 12891. 首先查看主库当前保护模式(主库操作)SQL> s ...
相关推荐
### Windows平台Oracle RAC集群安装过程详解 #### 引言 在企业级应用环境中,为了提高数据处理能力和系统的高可用性,Oracle Real Application Clusters (RAC) 成为了一种热门选择。尤其在Windows平台上部署Oracle...
### Oracle RAC重启操作手册知识点解析 ...通过这些步骤,可以有效地完成Oracle RAC集群的重启过程,同时保证数据的安全性和一致性。这些知识点对于IT专业人士来说非常重要,尤其是在处理大型数据库系统时更是如此。
大话Oracle_RAC__集群_高可用性_备份与恢复.pdf
Oracle RAC(Real Application Clusters)是 Oracle 公司的一种集群解决方案,旨在提供高可用性和高性能的数据库解决方案。在生产环境下实施 Oracle RAC 需要进行详细的规划和实施本文将详细介绍 Oracle RAC 项目...
Oracle RAC(Real Application Clusters)集群是一种高可用性解决方案,它允许多个实例同时访问同一个数据库,确保在单个节点故障时数据的连续性和一致性。归档日志是Oracle数据库的重要组成部分,它们记录了自上次...
在linux操作系统上部署oracle RAC 全过程和截图,包括:环境说明,搭建注意,搭建过程,检查,搭建iscsi存储,用的ASM自动存储管理。
Oracle RAC 集群开启归档日志 Oracle RAC 集群开启归档日志是指在 Oracle RAC 集群环境中启用归档日志功能,以便提高数据库的可靠性和可维护性。为了实现这一功能,需要修改集群参数文件,重启数据库。 在开启归档...
Oracle RAC(Real Application Clusters)是一种高可用性解决方案,它允许多个实例在多台服务器上共享同一个数据库,提供故障切换和负载均衡能力。在Veritas环境下的Oracle RAC配置,结合Veritas Cluster Server...
### Oracle单机通过RMAN迁移到RAC集群 #### 一、背景介绍 随着业务规模的不断扩大和技术的进步,原有的数据库系统可能无法满足当前的需求。为了提高系统的可用性、可扩展性和性能,通常需要将原有的单机Oracle...
脑裂(Brain Split)是指在一个分布式系统中,由于网络分区或其他故障,使得系统的一部分节点无法与其他部分通信,但每个部分都认为自己是有效的并继续运行,这就导致了数据不一致的问题。在Oracle RAC环境中,如果...
"基于Linux平台Oracle RAC集群数据库监控系统的设计与实现" 本文设计了一种基于Linux平台的Oracle RAC集群数据库监控系统,旨在解决企业信息化日常管理中的数据库系统稳定性问题。该系统使用信息化方法,在Linux...
"Oracle RAC集群体系结构" Oracle RAC(Real Application Cluster)是一种高可用性、高性能的集群系统,由Oracle Corporation提供。RAC系统由两个部分组成:Oracle Clusterware和Real Application Clusters。 ...
然而,将Oracle RAC(Real Application Clusters)部署到华为云ECS(Elastic Cloud Server)上会面临一些挑战,如集群配置错误和网络设置问题。本文档旨在提供详尽的指导,帮助用户高效地在云环境中搭建Oracle RAC ...
总之,《大话Oracle.RAC:集群、高可用性、备份与恢复(第2版)》将全面覆盖Oracle RAC的关键技术和实践,为数据库管理员和IT专业人士提供了深入了解和掌握RAC的宝贵资源。通过学习本书,读者可以提升在Oracle RAC...
Oracle RAC 11g 版本是在 Oracle Database 11g Release 2 中提供的,它增强了对集群环境的支持和性能优化。在 Oracle Enterprise Linux(OEL)上构建 RAC 集群,并结合 iSCSI 存储技术,可以创建一个经济高效且可靠...
教程名称:Oracle RAC数据库集群视频教程(10讲)课程目录:【】1.OracleRAC集群体系结构_drm【】10.测试OracleRAC数据库集群功能【】2.安装OracleRAC数据库(一)【】3.安装OracleRAC数据库(二)_drm【】4.安装...
在本文档中,我们将深入探讨Oracle RAC(Real Application Clusters)和WebLogic集群的安装、配置与部署过程。Oracle RAC是一种高可用性解决方案,可使多台服务器共享同一数据库,提供故障切换和负载均衡能力。...
Oracle RAC 19c集群搭建指南:实现高可用、高性能的数据库集群(超详细),一看就会的那种
基于天翼云ECS搭建Oracle+RAC集群指南7+11G 本文档旨在指导用户基于天翼云ECS搭建Oracle+RAC集群,涵盖了从资源规划到创建资源的整个过程。 资源规划 资源规划是搭建Oracle+RAC集群的关键步骤之一。在本节中,...
Oracle RAC 数据库集群技术详解 Oracle RAC 数据库集群技术是指通过多个服务器节点的集群来提供高可用性、负载均衡和可扩展性的数据库解决方案。该技术可以满足企业对高性能、可靠性和可扩展性的需求。 为什么要...