`

了解Oracle RAC Brain Split Resolution集群脑裂协议

 
阅读更多

CSS工作原理

在理解脑裂(Brain Split)处理过程前,有必要介绍一下Oracle RAC Css(Cluster Synchronization Services)的工作框架:

Oracle RAC CSS提供2种后台服务包括群组管理(Group Managment简称GM)和节点监控(Node Monitor简称NM),其中GM管理组(group)和锁(lock)服务。在集群中任意时刻总有一个节点会充当GM主控节点(master node)。集群中的其他节点串行地将GM请求发送到主控节点(master node),而master node将集群成员变更信息广播给集群中的其他节点。组成员关系(group membership)在每次发生集群重置(cluster reconfiguration)时发生同步。每一个节点独立地诠释集群成员变化信息。

而节点监控NM服务则负责通过skgxn(skgxn-libskgxn.a,提供节点监控的库)与其他厂商的集群软件保持节点信息的一致性。此外NM还提供对我们熟知的网络心跳(Network heartbeat)和磁盘心跳(Disk heartbeat)的维护以保证节点始终存活着。当集群成员没有正常Network heartbeat或Disk heartbeat时NM负责将成员踢出集群,被踢出集群的节点将发生节点重启(reboot)。

NM服务通过OCR中的记录(OCR中记录了Interconnect的信息)来了解其所需要监听和交互的端点,将心跳信息通过网络发送到其他集群成员。同时它也监控来自所有其他集群成员的网络心跳Network heartbeat,每一秒钟都会发生这样的网络心跳,若某个节点的网络心跳在misscount(by the way:10.2.0.1中Linux上默认misscount为60s,其他平台为30s,若使用了第三方vendor clusterware则为600s,但10.2.0.1中未引入disktimeout;10.2.0.4以后misscount为60s,disktimeout为200s;11.2以后misscount为30s:CRS-4678: Successful get misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services)指定的秒数中都没有被收到的话,该节点被认为已经”死亡”了。NM还负责当其他节点加入或离开集群时初始化集群的重置(Initiates cluster reconfiguration)。

在解决脑裂的场景中,NM还会监控voting disk以了解其他的竞争子集群(subclusters)。关于子集群我们有必要介绍一下,试想我们的环境中存在大量的节点,以Oracle官方构建过的128个节点的环境为我们的想象空间,当网络故障发生时存在多种的可能性,一种可能性是全局的网络失败,即128个节点中每个节点都不能互相发生网络心跳,此时会产生多达128个的信息”孤岛”子集群。另一种可能性是局部的网络失败,128个节点中被分成多个部分,每个部分中包含多于一个的节点,这些部分就可以被称作子集群(subclusters)。当出现网络故障时子集群内部的多个节点仍能互相通信传输投票信息(vote mesg),但子集群或者孤岛节点之间已经无法通过常规的Interconnect网络交流了,这个时候NM Reconfiguration就需要用到voting disk投票磁盘。

Voting Disk

因为NM要使用voting disk来解决因为网络故障造成的通信障碍,所以需要保证voting disk在任意时刻都可以被正常访问。在正常状态下,每个节点都会进行磁盘心跳活动,具体来说就是会到投票磁盘的某个块上写入disk心跳信息,这种活动每一秒钟都会发生,同时CSS还会每秒读取一种称作”kill block”的”赐死块”,当”kill block”的内容表示本节点被驱逐出集群时,CSS会主动重启节点。

为了保证以上的磁盘心跳和读取”kill block”的活动始终正常运作CSS要求保证至少(N/2+1)个投票磁盘要被节点正常访问,这样就保证了每2个节点间总是至少有一个投票磁盘是它们都可以正常访问的,在正常情况下(注意是风平浪静的正常情况)只要节点所能访问的在线voting disk多于无法访问的voting disk,该节点都能幸福地活下去,当无法访问的voting disk多于正常的voting disk时,Cluster Communication Service进程将失败并引起节点重启。所以有一种说法认为voting disk只要有2个足以保证冗余度就可以了,没有必要有3个或以上voting disk,这种说法是错误的。Oracle推荐集群中至少要有3个voting disks。

Question:

有同学问那么voting disk  必须是奇数个呢?

Answer:

实际上我们仅仅是推荐使用奇数个vote disk ,而非必须是奇数个。10gR2中vote disk的数目上限是32个。

Question

我们可以使用2或4个vote disk吗?

Answer:

可以的。 但是2、4这样的数目在“至少(N/2+1)个投票磁盘要被节点正常访问”这一disk heartbeat的硬性算法下是不利的:

当我们使用2个vote disk 时,不能发生任意个vote disk的心跳失败

当我们使用3个vote disk 时,不能发生大于1个的vote disk心跳失败

当我们使用4个vote disk 时,不能发生大于1个的vote disk心跳失败 ,这和3个时的容错率是一样,但是因为我们有更多的vote disk,这会导致管理成本和引入的风险增长

当我们使用5个vote disk 时,不能发生大于2个的vote disk心跳失败

当我们使用6个vote disk 时,仍然不能发生大于2个的vote disk心跳失败, 同样的因为比5时多出一个 ,也会引入不合理的管理成本和风险

Question:

若节点间的网络心跳正常,且节点所能正常心跳的vote disk 大于不能正常访问的 ,如3个votedisk 时恰巧有1个vote disk 的disk heartbeat 超时,此时Brain split 会发生吗?

Answer:

这种情况即不会触发Brain Split,也不会引发节点驱逐协议(eviction protocol)。 当单个或小于(N/2+1)个的voting disk心跳失败(disk heartbeat failure)时,这种心跳失败可能是由于短期内节点访问voting disk发生I/O error错误而引起的,此时css会立刻将这些失败的voting disk标记为OFFLINE。虽然有一定数量的voting disk OFFLINE了,但是我们仍有至少(N/2+1)个投票磁盘可用,这保证了eviction protocol不会被调用,所以没有节点会被reboot重启。紧接着node monitor模块的Disk ping Monitor Thread(DPMT-clssnmDiskPMT)会重复尝试访问这些失败的OFFLINE voting disk,若这些投票磁盘变得再次可I/O访问且经过验证其上的数据也没有讹误,那么css会再次将此voting disk标记为ONLINE;但是如果在45s( 这里的45s是基于misscount和 内部算法获得的) 内仍不能正常访问相关的voting disk,那么DMPT将在cssd.log中生成警告信息,如:

CSSD]2011-11-11 20:11:20.668 >

WARNING: clssnmDiskPMT: long disk latency >(45940 ms) to voting disk (0//dev/asm-votedisk1)

假设以上发生clssnmDiskPMT警告的RAC场景共有3个voting disk,现已有一个asm-votedisk1因为I/O error或其他原因而被标记为OFFLINE,若此时再有一个votedisk也出现了问题并disk heartbeat 失败,那么节点会因为少于规定数目(2)的votedisk而引发eviction protocol,进而重启reboot。

单个或小于(N/2+1)个的voting disk心跳失败都仅仅生成警告(Warning),而非致命的错误。因为仍有绝大数量的vote disk可被访问,因此生成的警告都是非致命的,eviction protocol将不会被触发。

当实际的NM Reconfiguration集群重置情况发生时所有的active节点和正在加入集群的节点都会参与到reconfig中,那些没有应答(ack)的节点都将不再被归入新的集群关系中。实际上reconfig重置包括多个阶段:

1.初始化阶段 — reconfig manager(由集群成员号最低的节点担任)向其他节点发送启动reconfig的信号

2.投票阶段 — 节点向reconfig manager发送该节点所了解的成员关系

3.脑裂检查阶段 — reconfig manager检查是否脑裂

4.驱逐阶段 — reconfig manager驱逐非成员节点

5.更新阶段 — reconfig manager向成员节点发送权威成员关系信息

在脑裂检查阶段Reconfig Manager会找出那些没有Network Heartbeat而有Disk Heartbeat的节点,并通过Network Heartbeat(如果可能的话)和Disk Heartbeat的信息来计算所有竞争子集群(subcluster)内的节点数目,并依据以下2种因素决定哪个子集群应当存活下去:

拥有最多节点数目的子集群(Sub-cluster with largest number of Nodes)

若子集群内数目相等则为拥有最低节点号的子集群(Sub-cluster with lowest node number),举例来说在一个2节点的RAC环境中总是1号节点会获胜。

采用Stonith algorithm 的IO fencing(remote power reset)

STONITH算法是一种常用的I/O Fencing algorithm,是RAC中必要的远程关闭节点的接口。其想法十分简单,当某个节点上运行的软件希望确保本集群内的其他节点不能使用某种资源时,拔出其他节点的插座即可。这是一种简单、可靠且有些残酷的算法。Stonith 的优势是其没有特定的硬件需求,也不会限制集群的扩展性。

Oracle Clusterware的Process Monitor模块负责实现IO fencing,保证不会因节点/实例的不协调工作而产生corruption。Process Monitor的工作具体由hangcheck timer或者oprocd 完成, 在Linux平台上10.2.0.4 之前都没有oprocd的存在(其他Unix平台在10.2.0.1就有了),在安装RAC之前需要额外安装hangcheck timer软件以保证IO fencing, 到10.2.0.4 时Linux上也有了oprocd,具体见<Know about RAC Clusterware Process OPROCD> 一文。 这些负责IO fencing的进程一般都会被锁定在内存中(locked in memory)、实时运行(Real time)、休眠固定的时间(Sleep a fixed time)、以root身份运行;若进程唤醒后发现时间已经太晚,那么它会强制reboot;若这些进程发生失败,则也会重启,所以在RAC环境中oprocd是一个很重要的进程,不要失去手动去kill这个进程。

在完成脑裂检查后进入驱逐阶段,被驱逐节点会收到发送给它们的驱逐信息(如果网络可用的话),若无法发送信息则会通过写出驱逐通知到voting disk上的”kill block”来达到驱逐通知的目的。同时还会等待被驱逐节点表示其已收到驱逐通知,这种表示可能是通过网络通信的方式也可能是投票磁盘上的状态信息。

可以看到Oracle CSS中Brain Split Check时会尽可能地保证最大的一个子集群存活下来以保证RAC系统具有最高的可用性。

实际案例日志

1号节点网络失败,2,3号节点形成子集群;2,3节点通过voting disk向1号节点发起驱逐:

以下为1号节点的ocssd.log日志:

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 29.610 seconds

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1037, misstime 30390

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 3.280308e-268artbeat fatal, eviction in 29.150 seconds

对2,3号节点发起misscount计时

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30850

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:11:44.368 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 28.610 seconds

[    CSSD]2011-04-23 17:12:04.778 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 14.580 seconds

[    CSSD]2011-04-23 17:12:04.779 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 3.280308e-268artbeat fatal, eviction in 14.120 seconds

[    CSSD]2011-04-23 17:12:06.207 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 13.580 seconds

[    CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 5.560 seconds

[    CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 5.100 seconds

[    CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 4.560 seconds

[    CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 4.100 seconds

[    CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 3.560 seconds

[    CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 3.100 seconds

[    CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 2.560 seconds

[    CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 2.100 seconds

[    CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 1.550 seconds

[    CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 1.090 seconds

2号节点的ocssd.log日志:

[    CSSD]2011-04-23 17:11:53.054 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 29.800 seconds

[    CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE:   clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30200

[    CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:11:54.516 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 28.790 seconds

[    CSSD]2011-04-23 17:12:14.826 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 14.800 seconds

[    CSSD]2011-04-23 17:12:16.265 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 13.800 seconds

[    CSSD]2011-04-23 17:12:27.755 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 5.800 seconds

[    CSSD]2011-04-23 17:12:29.197 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 4.800 seconds

[    CSSD]2011-04-23 17:12:30.658 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 3.800 seconds

[    CSSD]2011-04-23 17:12:32.133 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 2.800 seconds

[    CSSD]2011-04-23 17:12:33.602 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 1.790 seconds

[    CSSD]2011-04-23 17:12:35.126 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 0.800 seconds

[    CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]

clssnmHandleSyn应答3号节点发送的同步信息

[    CSSD]2011-04-23 17:12:35.399 [5073104] >USER:    NMEVENT_SUSPEND [00][00][00][0e]

发生Node Monitoring SUSPEND事件

[    CSSD]2011-04-23 17:12:35.405 [117574544] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(10)

通过clssnmSendVoteInfo向3号节点发送投票信息Vote mesg

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 1, state (3/0) unique (1303592601/1303592601) prevConuni(0) birth (9/9) (old/new)

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7e80ae8), probe((nil))

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmDeactivateNode: node 1 (vrh1) left cluster

确认1号节点离开了集群cluster

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)

[    CSSD]2011-04-23 17:12:35.415 [117574544] >USER:    clssnmHandleUpdate: SYNC(10) from node(3) completed

[    CSSD]2011-04-23 17:12:35.416 [117574544] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:12:35.416 [117574544] >USER:    clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:12:35.416 [117574544] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2011-04-23 17:12:35.416 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (10)

[    CSSD]2011-04-23 17:12:35.416 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][0c]

[    CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9)

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9)

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmEstablishConnections: 2 nodes in cluster incarn 10

[    CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE:   clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa

[    CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE:   clssgmPeerListener: connects done (2/2)

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2011-04-23 17:12:35.428 [3021970320] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

完成reconfiguration

[    CSSD]2011-04-23 17:12:35.440 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(10), with status(1)

以下为3号节点的ocssd.log:

[    CSSD]2011-04-23 17:12:36.303 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 1.867300e-268artbeat fatal, eviction in 29.220 seconds

[    CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE:   clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30780

[    CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:12:57.889 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 1.867300e-268artbeat fatal, eviction in 14.220 seconds

[    CSSD]2011-04-23 17:13:10.674 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 5.220 seconds

[    CSSD]2011-04-23 17:13:12.115 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 4.220 seconds

[    CSSD]2011-04-23 17:13:13.597 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 3.210 seconds

[    CSSD]2011-04-23 17:13:15.024 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 2.220 seconds

[    CSSD]2011-04-23 17:13:16.504 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 1.220 seconds

[    CSSD]2011-04-23 17:13:17.987 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 0.220 seconds

[    CSSD]2011-04-23 17:13:18.325 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh1 (1), flags 0x040d, state 3, wt4c 0

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 10

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ALIVE

[    CSSD]2011-04-23 17:13:18.327 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(10)

[    CSSD]2011-04-23 17:13:18.329 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(2)

[    CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]

[    CSSD]2011-04-23 17:13:18.330 [8136912] >USER:    NMEVENT_SUSPEND [00][00][00][0e]

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 1, vrh1, misstime(60010) state(5)

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE

[    CSSD]2011-04-23 17:13:18.334 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(2)

[    CSSD]2011-04-23 17:13:18.335 [89033616] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(10)

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(13)

以上完成了2-3节点间的Vote mesg通信,这些信息包含Node identifier,GM peer to peer listening endpoint以及

View of cluster membership。

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

开始检测voting disk上的信息

[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: node 1, vrh1, state 5 with leader 1 has smaller cluster size 1; my cluster size 2 with leader 2

发现其他子集群,包含1号节点且1号节点为该子集群的leader,为最小子集群;3号与2号节点组成最大子集群,2号节点为leader节点

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmEvict: Start

[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Evicting node 1, vrh1, birth 9, death 10, 

impendingrcfg 1, stateflags 0x40d 

发起对1号节点的驱逐

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmSendShutdown: req to node 1, kill time 443294

[    CSSD]2011-04-23 17:13:18.339 [3032460176] >TRACE:   clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7eaf220), probe((nil))

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions: node 1, vrh1, undead 1

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmCheckKillStatus: Node 1, vrh1, down, LATS(443144),timeout(150)

clssnmCheckKillStatus检查1号节点是否down了

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSendUpdate: syncSeqNo(10)

[    CSSD]2011-04-23 17:13:18.341 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(2)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 1, state (5/0) unique (1303592601/1303592601) prevConuni(1303592601) birth (9/9) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmDeactivateNode: node 1 (vrh1) left cluster

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: SYNC(10) from node(3) completed

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:13:18.342 [89033616] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2011-04-23 17:13:18.347 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2011-04-23 17:13:18.348 [3032460176] >TRACE:   clssnmDoSyncUpdate: Sync 10 complete!

[    CSSD]2011-04-23 17:13:18.350 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (10)

[    CSSD]2011-04-23 17:13:18.350 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][0c]

[    CSSD]2011-04-23 17:13:18.351 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2011-04-23 17:13:18.352 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)

[    CSSD]2011-04-23 17:13:18.353 [3063929744] >TRACE:   clssgmDispatchCMXMSG(): got message type(7) src(2) incarn(10) during incarn(9/9)

[    CSSD]2011-04-23 17:13:18.354 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3

.........................省略若干行.........................

[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)

[    CSSD]2011-04-23 17:13:18.357 [3021970320] >TRACE:   clssgmEstablishConnections: 2 nodes in cluster incarn 10

[    CSSD]2011-04-23 17:13:18.366 [3063929744] >TRACE:   clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa

[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:   clssgmHandleDBDone(): src/dest (2/65535) size(68) incarn 10

[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:   clssgmPeerListener: connects done (2/2)

[    CSSD]2011-04-23 17:13:18.369 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)

更新阶段   

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 3, master node number 2

[    CSSD]2011-04-23 17:13:18.372 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(10), with status(1)

另一场景为1号节点未加入集群,2号节点的网络失败,因2号节点的member number较小故其通过voting disk向3号节点发起驱逐

以下为2号节点的ocssd.log日志

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 29.890 seconds

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:   clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30110

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:41:50.132 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 28.890 seconds

[    CSSD]2011-04-23 17:42:10.533 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 14.860 seconds

[    CSSD]2011-04-23 17:42:11.962 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 13.860 seconds

[    CSSD]2011-04-23 17:42:23.523 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 5.840 seconds

[    CSSD]2011-04-23 17:42:24.989 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 4.840 seconds

[    CSSD]2011-04-23 17:42:26.423 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 3.840 seconds

[    CSSD]2011-04-23 17:42:27.890 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 2.840 seconds

[    CSSD]2011-04-23 17:42:29.382 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 1.840 seconds

[    CSSD]2011-04-23 17:42:30.832 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 0.830 seconds

[    CSSD]2011-04-23 17:42:32.020 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh3 (3), flags 0x040d, state 3, wt4c 0

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 13

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[13] sync[13]

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 3, vrh3, misstime(60000) state(5)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:42:32.021 [5073104] >USER:    NMEVENT_SUSPEND [00][00][00][0c]

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)

[    CSSD]2011-04-23 17:42:32.022 [117574544] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(13)

[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: node 3, vrh3, state 5 with leader 3 has smaller cluster size 1; my cluster size 1 with leader 2

检查voting disk后发现子集群3为最小"子集群"(3号节点的node number较2号大);2号节点为最大子集群

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Start [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Evicting node 3, vrh3, birth 3, death 13, impendingrcfg 1, stateflags 0x40d

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmSendShutdown: req to node 3, kill time 1643084

发起对3号节点的驱逐和shutdown request

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmDiscHelper: vrh3, node(3) connection failed, con (0xb7e79bb0), probe((nil))

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions: node 3, vrh3, undead 1

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmCheckKillStatus: Node 3, vrh3, down, LATS(1642874),timeout(210)

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSendUpdate: syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 3, state (5/0) unique (1303591326/1303591326) prevConuni(1303591326) birth (3/3) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmDeactivateNode: node 3 (vrh3) left cluster

[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:    clssnmHandleUpdate: SYNC(13) from node(2) completed

[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmDoSyncUpdate: Sync 13 complete!

[    CSSD]2011-04-23 17:42:32.024 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (13)

[    CSSD]2011-04-23 17:42:32.024 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][04]

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(crs_version) birth(3/3)

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3

................省略若干行..............

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(#CSS_CLSSOMON) birth(3/3)

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 13

[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerDeactivate: node 3 (vrh3), death 13, state 0x0 connstate 0xf

[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerListener: connects done (1/1)

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 13 is node(2) birth(2)

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 13 with 1 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

完成reconfiguration

[    CSSD]2011-04-23 17:42:32.027 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(13), with status(1)

以下为3号节点的ocssd.log日志:

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 1.867300e-268artbeat fatal, eviction in 29.360 seconds

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:   clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1039, misstime 30640

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:42:55.168 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 1.867300e-268artbeat fatal, eviction in 14.330 seconds

[    CSSD]2011-04-23 17:43:08.182 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 5.310 seconds

[    CSSD]2011-04-23 17:43:09.661 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 4.300 seconds

[    CSSD]2011-04-23 17:43:11.144 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 3.300 seconds

[    CSSD]2011-04-23 17:43:12.634 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 2.300 seconds

[    CSSD]2011-04-23 17:43:14.053 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 1.300 seconds

[    CSSD]2011-04-23 17:43:15.467 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 0.300 seconds

[    CSSD]2011-04-23 17:43:15.911 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh2 (2), flags 0x040f, state 3, wt4c 0

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 13

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ALIVE

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(13)

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[29] sync[13]

[    CSSD]2011-04-23 17:43:15.912 [8136912] >USER:    NMEVENT_SUSPEND [00][00][00][0c]

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 2, vrh2, misstime(60010) state(5)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE

[    CSSD]2011-04-23 17:43:15.913 [89033616] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(13)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssnmCheckDskInfo: Aborting local node to avoid splitbrain. [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: : my node(3), Leader(3), Size(1) VS Node(2), Leader(2), Size(1)

读取voting disk后发现kill block,为避免split brain,自我aborting!

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   ###################################

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   clssscExit: CSSD aborting

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   ###################################

[    CSSD]--- DUMP GROCK STATE DB ---

[    CSSD]----------

[    CSSD]  type 2, Id 4, Name = (crs_version)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =2, master 2

[    CSSD]   . . . . .

[    CSSD]     memberNo =2, seq 2

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]   . . . . .

[    CSSD]     memberNo =1, seq 12

[    CSSD]     flags = 0x1000, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 2, nodeBirth 2

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 3, Id 11, Name = (_ORA_CRS_FAILOVER)

[    CSSD]  flags: 0x0

[    CSSD]  grant: count=1, type 3, wait 1

[    CSSD]  Member Count =1, master -3

[    CSSD]   . . . . .

[    CSSD]     memberNo =0, seq 0

[    CSSD]     flags = 0x12, granted 1

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 2, Id 2, Name = (EVMDMAIN)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =2, master 2

[    CSSD]   . . . . .

[    CSSD]     memberNo =2, seq 1

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 2, nodeBirth 2

[    CSSD]     privateDataSize = 508

[    CSSD]     publicDataSize = 504

[    CSSD]   . . . . .

[    CSSD]     memberNo =3, seq 2

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 508

[    CSSD]     publicDataSize = 504

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 2, Id 5, Name = (CRSDMAIN)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =1, master 3

[    CSSD]   . . . . .

[    CSSD]     memberNo =3, seq 2

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 128

[    CSSD]     publicDataSize = 128

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh1)

[    CSSD]  flags: 0x0

[    CSSD]  grant: count=1, type 3, wait 1

[    CSSD]  Member Count =1, master -3

[    CSSD]   . . . . .

[    CSSD]     memberNo =0, seq 0

[    CSSD]     flags = 0x12, granted 1

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh3)

[    CSSD]  flags: 0x0

[    CSSD]  grant: count=1, type 3, wait 1

[    CSSD]  Member Count =1, master -3

[    CSSD]   . . . . .

[    CSSD]     memberNo =0, seq 0

[    CSSD]     flags = 0x12, granted 1

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 2, Id 3, Name = (ocr_crs)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =2, master 3

[    CSSD]   . . . . .

[    CSSD]     memberNo =3, seq 2

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 32

[    CSSD]   . . . . .

[    CSSD]     memberNo =2, seq 12

[    CSSD]     flags = 0x1000, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 2, nodeBirth 2

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 32

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 2, Id 1, Name = (#CSS_CLSSOMON)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =2, master 2

[    CSSD]   . . . . .

[    CSSD]     memberNo =2, seq 1

[    CSSD]     flags = 0x1000, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 2, nodeBirth 2

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]   . . . . .

[    CSSD]     memberNo =3, seq 2

[    CSSD]     flags = 0x1000, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]--- END OF GROCK STATE DUMP ---

[    CSSD]------- Begin Dump -------

补充官方文档

The RM (Reconfig Manager) sends a sync message to all participating nodes. Participating nodes respond with a sync acknowledgement. After this the vote phase begins and the master sends a vote message to all participating nodes. Participating nodes repond with a vote info message containing their node identifier and GM peer to peer listening endpoint. In the split-check phase, the RM uses the voting disk to verify there is no split-brain. It finds nodes heartbeating to disk that are not connected via the network. If it finds these, it will determine which nodes are talking to which and the largest subcluster survives. For example, if we have a 5 node cluster and all of the nodes are heartbeating to the voting disk but only a group of 3 can communicate via the network and a group of 2 can communication via the network, this means we have 2 subclusters. The largest subcluster (3) would survive while the other subcluster (2) would not. After this the evict phase would evict nodes previously in the cluster but not considered members in this incarnation. In this case we would send a message to evicted nodes (if possible) and write eviction notice to a ‘kill’ block in the voting file. We would wait for the node to indicate it got the eviction notice (wait for seconds). The wait is terminated by a message or status on the voting file indicating that the node got the eviction notice. In the update phase the master sends an update message containing the definitive cluster membership and node information for all particpating nodes. The participating nodes send update acknowledgements. All members queue the reconfiguration event ot their GM.

As far as voting disks are concerned, a node must be able to access strictly more than half of the voting disks at any time. So if you want to be able to tolerate a failure of n voting disks, you must have at least 2n+1 configured. (n=1 means 3 voting disks). You can configure up to 32 voting disks, providing protection against 15 simultaneous disk failures.

Oracle recommends that customers use 3 or more voting disks in Oracle RAC 10g Release 2. Note: For best availability, the 3 voting files should be physically separate disks. It is recommended to use an odd number as 4 disks will not be any more highly available than 3 disks, 1/2 of 3 is 1.5…rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our cluster will fail with both 4 voting disks or 3 voting disks.

When interconnect breaks – keeps the largest cluster possible up, other nodes will be evicted, in 2 node cluster lowest number node remains. 

Node eviction: pick a cluster node as victim to reboot.Always keep the largest cluster possible up, evicted other nodes two nodes: keep the lowest number node up and evict other


参考至:http://blog.csdn.net/liu_maclean/article/details/7171511

如有错误,欢迎指正

邮箱:czmcj@163.com

分享到:
评论

相关推荐

    Windows平台Oracle RAC 集群安装过程

    ### Windows平台Oracle RAC集群安装过程详解 #### 引言 在企业级应用环境中,为了提高数据处理能力和系统的高可用性,Oracle Real Application Clusters (RAC) 成为了一种热门选择。尤其在Windows平台上部署Oracle...

    Oracle RAC重启操作手册

    ### Oracle RAC重启操作手册知识点解析 ...通过这些步骤,可以有效地完成Oracle RAC集群的重启过程,同时保证数据的安全性和一致性。这些知识点对于IT专业人士来说非常重要,尤其是在处理大型数据库系统时更是如此。

    oracle RAC集群修改归档日志路径.docx

    Oracle RAC(Real Application Clusters)集群是一种高可用性解决方案,它允许多个实例同时访问同一个数据库,确保在单个节点故障时数据的连续性和一致性。归档日志是Oracle数据库的重要组成部分,它们记录了自上次...

    大话Oracle_RAC__集群_高可用性_备份与恢复.pdf

    大话Oracle_RAC__集群_高可用性_备份与恢复.pdf

    Oracle RAC 项目实施方案

    Oracle RAC(Real Application Clusters)是 Oracle 公司的一种集群解决方案,旨在提供高可用性和高性能的数据库解决方案。在生产环境下实施 Oracle RAC 需要进行详细的规划和实施本文将详细介绍 Oracle RAC 项目...

    linux系统部署oracle RAC(集群)

    在linux操作系统上部署oracle RAC 全过程和截图,包括:环境说明,搭建注意,搭建过程,检查,搭建iscsi存储,用的ASM自动存储管理。

    基于Oracle RAC 双活方案实施,如何规避脑裂风险- 最佳实践.docx

    脑裂(Brain Split)是指在一个分布式系统中,由于网络分区或其他故障,使得系统的一部分节点无法与其他部分通信,但每个部分都认为自己是有效的并继续运行,这就导致了数据不一致的问题。在Oracle RAC环境中,如果...

    基于Linux平台Oracle RAC集群数据库监控系统的设计与实现.pdf

    "基于Linux平台Oracle RAC集群数据库监控系统的设计与实现" 本文设计了一种基于Linux平台的Oracle RAC集群数据库监控系统,旨在解决企业信息化日常管理中的数据库系统稳定性问题。该系统使用信息化方法,在Linux...

    oracle rac (veritas)双机集群,机器主板挂掉后的单节点启动

    Oracle RAC(Real Application Clusters)是一种高可用性解决方案,它允许多个实例在多台服务器上共享同一个数据库,提供故障切换和负载均衡能力。在Veritas环境下的Oracle RAC配置,结合Veritas Cluster Server...

    Oracle RAC集群体系结构.docx

    "Oracle RAC集群体系结构" Oracle RAC(Real Application Cluster)是一种高可用性、高性能的集群系统,由Oracle Corporation提供。RAC系统由两个部分组成:Oracle Clusterware和Real Application Clusters。 ...

    在 Oracle Enterprise Linux 和 iSCSI 上构建您自己的 Oracle RAC 11g 集群

    Oracle RAC 11g 版本是在 Oracle Database 11g Release 2 中提供的,它增强了对集群环境的支持和性能优化。在 Oracle Enterprise Linux(OEL)上构建 RAC 集群,并结合 iSCSI 存储技术,可以创建一个经济高效且可靠...

    OracleRAC数据库集群视频教程(10讲)

    教程名称:Oracle RAC数据库集群视频教程(10讲)课程目录:【】1.OracleRAC集群体系结构_drm【】10.测试OracleRAC数据库集群功能【】2.安装OracleRAC数据库(一)【】3.安装OracleRAC数据库(二)_drm【】4.安装...

    oracle RAC 和 weblogic集群安装配置和部署文档

    在本文档中,我们将深入探讨Oracle RAC(Real Application Clusters)和WebLogic集群的安装、配置与部署过程。Oracle RAC是一种高可用性解决方案,可使多台服务器共享同一数据库,提供故障切换和负载均衡能力。...

    Oracle RAC 19c集群搭建指南:实现高可用、高性能的数据库集群(超详细).md

    Oracle RAC 19c集群搭建指南:实现高可用、高性能的数据库集群(超详细),一看就会的那种

    基于天翼云ECS搭建Oracle+RAC集群指南7+11G

    基于天翼云ECS搭建Oracle+RAC集群指南7+11G 本文档旨在指导用户基于天翼云ECS搭建Oracle+RAC集群,涵盖了从资源规划到创建资源的整个过程。 资源规划 资源规划是搭建Oracle+RAC集群的关键步骤之一。在本节中,...

    Oracle单机通过rman迁移到RAC集群.docx

    ### Oracle单机通过RMAN迁移到RAC集群 #### 一、背景介绍 随着业务规模的不断扩大和技术的进步,原有的数据库系统可能无法满足当前的需求。为了提高系统的可用性、可扩展性和性能,通常需要将原有的单机Oracle...

    Oracle RAC数据库集群PPT教案.pptx

    Oracle RAC 数据库集群技术详解 Oracle RAC 数据库集群技术是指通过多个服务器节点的集群来提供高可用性、负载均衡和可扩展性的数据库解决方案。该技术可以满足企业对高性能、可靠性和可扩展性的需求。 为什么要...

    Oracle rac功能测试方案

    1. **性能测试**:评估Oracle RAC集群的I/O性能及最大处理能力。 2. **负载均衡测试**:验证客户端和服务端是否能有效分发负载,确保资源利用率最大化。 3. **网络异常测试**:模拟网络故障情况下Oracle RAC的恢复...

    Oracle RAC集群aix

    Oracle RAC(Real Application Clusters)集群是一种高可用性解决方案,专为在AIX(Amiga Interactive eXperience)操作系统上运行Oracle数据库设计。RAC允许多台服务器共享同一个数据库实例,提供故障切换和负载均衡...

    Oracle RAC 12c运维最佳实践指南.pdf

    Oracle RAC 12c 是一种高可用、高性能的集群解决方案,旨在实现数据库的高可用性和可扩展性。本文将对 Oracle RAC 12c 运维最佳实践指南进行解读,总结出相关的知识点。 一、 Oracle RAC 12c 概述 Oracle RAC 12c ...

Global site tag (gtag.js) - Google Analytics