ASM diskgroup dismount with "Waited 15 secs for write IO to PST"
ASM diskgroup dismount with "Waited 15 secs for write IO to PST"
SYMPTOMS
Normal or high redundancy diskgroup is dismounted with these WARNING messages.
Note-ASM alert.log Sat Mar 07 05:03:10 2015 WARNING: Waited 15 secs for write IO to PST disk 1 in group 2. WARNING: Waited 15 secs for write IO to PST disk 1 in group 2. WARNING: Waited 18 secs for write IO to PST disk 1 in group 2. WARNING: Waited 18 secs for write IO to PST disk 1 in group 2. WARNING: Waited 21 secs for write IO to PST disk 1 in group 2. WARNING: Waited 21 secs for write IO to PST disk 1 in group 2. WARNING: Waited 24 secs for write IO to PST disk 1 in group 2. WARNING: Waited 24 secs for write IO to PST disk 1 in group 2. Sat Mar 07 05:03:22 2015 WARNING: Waited 27 secs for write IO to PST disk 1 in group 2. WARNING: Waited 27 secs for write IO to PST disk 1 in group 2. WARNING: Waited 30 secs for write IO to PST disk 1 in group 2. WARNING: Waited 30 secs for write IO to PST disk 1 in group 2. WARNING: Waited 33 secs for write IO to PST disk 1 in group 2. WARNING: Waited 33 secs for write IO to PST disk 1 in group 2. WARNING: Waited 36 secs for write IO to PST disk 1 in group 2. WARNING: Waited 36 secs for write IO to PST disk 1 in group 2. Sat Mar 07 05:03:34 2015
- ASM alert.log日志中出现如上所示的WARNING信息:WARNING: Waited 15 secs for write IO to PST disk 1 in group 2.该日志信息的大意为PST通信链路在访问磁盘组2中的磁盘1的时候等待了15秒钟,而且触发了持续的等待。超时等待会在频率触发的基础上递增每次的等待时间。出现这种状况的原因一般与操作系统网络通信链路,数据库主机磁盘或者超时参数的设置有关。我们继续查看ASM的alert.log日志来进一步分析。
Note-DiskGroup Dsimounted Mon Mar 09 16:32:11 2015 NOTE: process _b000_+asm1 (1051) initiating offline of disk 0.3915951733 (DATA_0000) with mask 0x7e in group 2 NOTE: process _b000_+asm1 (1051) initiating offline of disk 1.3915951732 (DATA_0001) with mask 0x7e in group 2 NOTE: checking PST: grp = 2 GMON checking disk modes for group 2 at 7 for pid 28, osid 1051 ERROR: no read quorum in group: required 2, found 1 disks NOTE: checking PST for grp 2 done. NOTE: initiating PST update: grp = 2, dsk = 0/0xe968ae75, mask = 0x6a, op = clear NOTE: initiating PST update: grp = 2, dsk = 1/0xe968ae74, mask = 0x6a, op = clear GMON updating disk modes for group 2 at 8 for pid 28, osid 1051 ERROR: no read quorum in group: required 2, found 1 disks Mon Mar 09 16:32:11 2015 NOTE: cache dismounting (not clean) group 2/0xEF985E9D (DATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 1056, image: oracle@rac1 (B001) Mon Mar 09 16:32:11 2015 NOTE: halting all I/Os to diskgroup 2 (DATA) Mon Mar 09 16:32:11 2015 NOTE: LGWR doing non-clean dismount of group 2 (DATA) NOTE: LGWR sync ABA=30.108 last written ABA 30.108 WARNING: Offline for disk DATA_0000 in mode 0x7f failed. WARNING: Offline for disk DATA_0001 in mode 0x7f failed
- 磁盘组2中的磁盘1因为某种原因导致反应缓慢或者HANG住,从而在ASM层面触发等待。但是,oracle的ASM机制仅仅在磁盘noresponsiness状态等待15秒钟,这是默认情况下的设置。虽然持续等待机制在11.2.0.4版本中会自动增加等待时间,但是该磁盘IO的等待也会有一个极限。当ASM确信磁盘组中的磁盘没有反应之后,便会OFFLINE该目标故障磁盘。
Mon Mar 09 16:32:11 2015 kjbdomdet send to inst 2 detach from dom 2, sending detach message to inst 2 Mon Mar 09 16:32:11 2015 NOTE: No asm libraries found in the system Mon Mar 09 16:32:11 2015 List of instances: 1 2 Dirty detach reconfiguration started (new ddet inc 1, cluster inc 16) ASM Health Checker found 1 new failures Global Resource Directory partially frozen for dirty detach * dirty detach - domain 2 invalid = TRUE 128 GCS resources traversed, 0 cancelled Dirty Detach Reconfiguration complete Mon Mar 09 16:32:11 2015
- 同时,oracle ASM也会尝试重新配置ASM 相应故障磁盘的通信链路并保存此时的集群件和ASM通信链路的状态。在以上的日志信息中表现为DETACH RECONFIGURATION信息。在此之后Oracle会尝试重新建立故障盘的通信链路和MOUNT目标磁盘组,从而恢复原有的正常状态。
Mon Mar 09 16:32:27 2015 Received dirty detach msg from inst 2 for dom 2 Mon Mar 09 16:32:27 2015 List of instances: 1 2 Dirty detach reconfiguration started (new ddet inc 2, cluster inc 16) Global Resource Directory partially frozen for dirty detach * dirty detach - domain 2 invalid = TRUE 128 GCS resources traversed, 0 cancelled freeing rdom 2 Dirty Detach Reconfiguration complete
Mon Mar 09 16:32:41 2015 NOTE:Waiting for all pending writes to complete before de-registering: grpnum 2 Mon Mar 09 16:32:58 2015 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14247.trc: ORA-15079: ASM file is closed Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14247.trc: ORA-15079: ASM file is closed Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14247.trc: ORA-15079: ASM file is closed Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14247.trc: ORA-15079: ASM file is closed Mon Mar 09 16:33:11 2015 SUCCESS: diskgroup DATA was dismounted SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER:4019740317 */ Mon Mar 09 16:33:11 2015 NOTE: diskgroup resource ora.DATA.dg is offline SUCCESS: ASM-initiated MANDATORY DISMOUNT of group DATA Mon Mar 09 16:33:11 2015 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14247.trc: ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14247.trc: ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14247.trc: ORA-15078: ASM diskgroup was forcibly dismounted WARNING: requested mirror side 1 of virtual extent 5 logical extent 0 offset 724992 is not allocated; I/O request failed WARNING: requested mirror side 2 of virtual extent 5 logical extent 1 offset 724992 is not allocated; I/O request failed Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14247.trc: ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted Mon Mar 09 16:33:11 2015 SQL> alter diskgroup DATA check /* proxy */ ORA-15032: not all alterations performed ORA-15001: diskgroup "DATA" does not exist or is not mounted ERROR: alter diskgroup DATA check /* proxy */ NOTE: client exited [14233] Mon Mar 09 16:33:16 2015 NOTE: [crsd.bin@rac1 (TNS V1-V3) 1581] opening OCR file
CAUSE
- Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.
- By the way the heart beat delays are sort of ignored for external redundancy diskgroup.ASM instance stop issuing more PST heart beat until it succeeds PST revalidation.
- but the heart beat delays do not dismount external redundancy diskgroup directly.
- + Some of the paths of the physical paths of the multipath device are offline or lost
- + During path 'failover' in a multipath set up
- + Server load, or any sort of storage/multipath/OS maintenance
The Doc ID 10109915.8 briefs about Bug 10109915(this fix introduce this underscore parameter). And the issue is with no OS/Storage tunable timeout mechanism in a case of a Hung NFS Server/Filer.And then _asm_hbeatiowait helps in setting the time out.
SOLUTION
- 1] Check with OS and Storage admin that there is disk unresponsiveness.
- 2] Possibly keep the disk responsiveness to below 15 seconds.
This will depend on various factors like
+ Operating System
+ Presence of Multipath ( and Multipath Type )
+ Any kernel parameter
+ Operating System
+ Presence of Multipath ( and Multipath Type )
+ Any kernel parameter
-
So you need to find out, what is the 'maximum' possible disk unresponsiveness for your set up.For example, on AIX rw_timeout setting affects this and defaults to 30 seconds.
Another example is Linux with native multipathing. In such set up, number of physical paths and polling_interval value in multipath.conf file, will dictate this maximum disk unresponsiveness.
So for your set up ( combination of OS / multipath / storage ), you need to find out this. -
3] If you can not keep the disk unresponsiveness to below 15 seconds, then the below parameter can be set in the ASM instance ( on all the Nodes of RAC ):
_asm_hbeatiowait
-
As per internal bug 17274537 , based on internal testing the value should be increased to 120 secs, the same will be fixed in 12.2
Run below in asm instance to set desired value for _asm_hbeatiowait
alter system set "_asm_hbeatiowait"=<value> scope=spfile sid='*';
-
And then restart asm instance / crs, to take new parameter value in effect.
相关推荐
标题中的“【故障】ASM diskgroup dismount with \"Waited 15 secs for write IO to PST\"”描述了一个Oracle Automatic Storage Management (ASM)磁盘组出现的问题,即在尝试卸载ASM磁盘组时,系统等待了15秒用于...
然而,当RAC中的ASM磁盘路径出现故障时,可能会导致OCR(Oracle Cluster Registry)和Voting Disk被Force Dismount,这是一个严重的故障情况,可能影响到整个集群的稳定性。 OCR是RAC中的关键组件,存储了集群的...
《PC三级考试中的IO.ASM详解》 在计算机科学领域,汇编语言是与机器代码最接近的编程语言,它直接对应于计算机硬件的操作。在PC三级考试中,IO.ASM是一个重要的知识点,主要涉及输入/输出(Input/Output,简称I/O)...
Oracle 11g R2 RAC with ASM 存储迁移手记 本文详细介绍了如何将 Oracle RAC 的数据库数据迁移到新的存储设备上,并提供了详细的迁移步骤和图文说明。整个迁移过程中,使用了 ASM DISKGROUP 的方式来完成存储迁移,...
Oracle ASM (Automatic Storage Management) 是Oracle数据库的一种存储管理技术,它提供了一种高效、自动化的I/O管理和磁盘组管理方式。Oracle ASMLib(Oracle ASM Library)是Oracle公司为ASM提供的一种驱动程序,...
IO4.ASM.asm
oracle10g rac asm for linux oracle10g rac asm for linux
IO2.ASM.asm
### ASM实例管理知识点详解 #### 一、ASM实例管理概述 自动存储管理(Automatic Storage Management,简称ASM)是Oracle提供的一种高性能的文件系统和卷管理器。它为Oracle数据库提供了块级磁盘冗余和自动故障恢复...
在Oracle数据库环境中,ASM(Automatic Storage Management)是一种用于管理存储的技术,它为数据库提供了一种高效、灵活且可扩展的方式来进行文件存储管理。对于DBA(数据库管理员)来说,掌握常用的ASM管理命令是...
### Oracle12c 实战ASM磁盘组管理 #### 知识点概述 本文将详细介绍Oracle12c中关于ASM(Automatic Storage Management)磁盘组管理的关键知识点,包括磁盘组属性的理解与配置、创建磁盘组的过程及注意事项。 #### ...
- **ASM_DISKGROUP、ASM_DISKSTRING、ASM_POWER_LIMIT**:这些参数用于控制ASM实例的行为,如磁盘组的定义、故障恢复策略等。 综上所述,ASM体系结构的设计充分考虑了高性能、高可用性和易于管理的需求,通过其丰富...
SQL> alter diskgroup test dismount; -- (from all the ASM instances, except from one) SQL> DROP DISKGROUP TEST; -- (from the ASM instance, which the disk group is still mounted) ``` ##### 5. 将磁盘...
标题与描述中的“asm不错的图”暗示了讨论的是与Oracle Automatic Storage Management (ASM)相关的图形化展示或概念图。ASM是Oracle数据库系统中用于管理存储的一种高性能、高可用性技术,它提供了一种灵活的方式对...
磁盘组是ASM存储的逻辑单位,包含一组磁盘,提供冗余和故障恢复能力。可以创建、扩展、修改和删除磁盘组,以调整存储策略和容量。 - **查询ASM信息** 使用SQL查询语句或者ASM专用工具如`asmcmd`,可以获取ASM...
### Oracle ASM结构详解 #### 一、概述 Oracle Automatic Storage Management (ASM) 是一种高性能的文件系统和卷管理器,专为Oracle数据库设计。它能够为数据库文件提供高可用性和可伸缩性,并且简化了存储管理。...
Oracle RAC ASM 磁盘组故障解决办法 本文旨在解决 Oracle RAC 环境中的磁盘组故障问题,具体来说是解决磁盘无法挂载、集群服务无法启动的问题。通过对问题的分析和解决,文章将从问题的背景、问题描述、故障解决...
通过ASM,可以实现磁盘的自动平衡、故障恢复和在线扩展,增强了数据库的高可用性和灵活性。 在安装Oracle 12c使用ASM时,需要特别注意的是,对磁盘的管理会有所不同,需要预先规划好磁盘组的大小和分布,以及ASM...
DiskPath - Path to ASM disk DiskName - ASM disk name Gr - ASM disk group number Dsk - ASM disk number Reads - Reads Writes - Writes AvRdTm - Average read time (in msec) AvWrTm - Average ...