ORA-15063: ASM discovered an insufficient number of disks for diskgroup(原创)

czmmiao

浏览: 4431710 次
性别:
来自: 厦门

最近访客更多访客>>

amwfngt

zzbing

sky3063

hotsunshine

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Oracle故障诊断

ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

ORA-15063这个报错的原因层出不穷，搜了好一阵各种试过各种方法后最终判定为bug，有点郁闷，这里对网上和metalink的方法进行各总结，以便日后查看。具体原因和解决方法如下

操作系统无法识别ASM磁盘或权限设置不当

在故障主机上执行

#oracleasm scandisks

确保ls -lASM磁盘的输出如下
#ls -l /dev/oracleasm/disks/
total 0
brw-rw---- 1 oracle dba 8, 33 Nov 25 14:35 VOL1

如果依然无法解决，且是11.1或者之后的版本，可以采用如下语句,然后再重新创建磁盘组
drop diskgroup <diskgroup_name> including contents force;
INCLUDING CONTENTS:删除掉磁盘组中的所有文件。
FORCE:清除磁盘头的相关信息。
在集群环境下如果操作的磁盘组正在使用或者在其他节点上挂载，则该语句将失败。

asm_diskgroups和asm_diskstring没有正确配置
ASM_DISKGROUPS指定了ASM实例启动或者ALTER DISKGROUP ALL MOUNT命令时自动挂载的磁盘组。当磁盘组创建、挂载或者被删除时，ASM实例都会自动更新该参数
ASM_DISKSTRING指定了ASM实例发现磁盘的路径，也就是说ASM实例只会在该参数指定的路径下发现ASM磁盘。该参数支持通配符。 ASM_DISKSTRING默认值为null，表示ASM实例查找所有与操作系统向连的可读写磁盘以确认哪些是真正的ASM磁盘。虽然官方文档上说该参数支持通配符，但有时候只有明确指定后才会成功修复ORA-15063，估计又是个bug。
解决步骤如下
1、先讲所有已经挂载的磁盘dismout
SQL>alter diskgroup all dismount
2、重新指定asm_diskstring:
SQL> alter system set asm_diskstring='/dev/mapper/datap1','/dev/mapper/frap1';

SQL> alter system set asm_diskgroups='/dev/oracleasm/disks/ASMGROUP3','/dev/oracleasm/disks/ASMGROUP4';
3、dbca 继续挂载，当然也可以手工的创建磁盘组。
如果依然不行可以执行如下操作
/etc/init.d/init.cssd stop
/etc/init.d/init.cssd start
sleep 90
restart ASM instance and shutdown immediate
again restart ASM instance
存储复制技术导致报错
可以查看官方文档

ORA-15063 When Mounting a Diskgroup After Storage Cloning ( BCV / Split Mirror / SRDF / HDS / Flash Copy ) [ID 784776.1]

Cause

Storage cloning doesn't create a consistent image during split copy.
There is a problem in other disk metadata structures (PST block, listheader f1b1locn, disk directory, etc.)
Header status is MEMBER but diskgroup can't be mounted.

-- Disk content in production server
$ kfed read WRADK003.dmp ausz=4194304 aunum=1 blknum=0

kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt: 2 ; 0x003: 0x02
kfbh.block.blk: 262144 ; 0x004: T=0 NUMB=0x40000
kfbh.block.obj: 16777344 ; 0x008: TYPE=0x0 NUMB=0x80
kfbh.check: 1240136211 ; 0x00c: 0x49eafa13
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
-- Disk content on target (backup) storage
$ kfed read WRADK003.dmp ausz=4194304 aunum=1 blknum=0

kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0
...

Disk WRADK003.dmp contains Partner Status Table in the source storage. However, its content is all zero in the backup storage after the split copy. Storage cloning doesn't create a consistent image of ASM LUNs.
Allocation unit containing PST block is apperantly not copied from production to backup disks.

In this particular case, we see the PST is corrupted. But basically the issue here is that the clone was not consistent, and the invalid ASM Metadata could be on anywhere on the disk(s), and could prohibit mount of the diskgroup.

Solution

The problem could be with the incorrect initial configuration of the clones on the storage array. The first time copy must be the full sync. After only having a full first copy, it is possible to do incremental clone.

There is various storage cloning methods and technologies used by different storage vendors. As long as the cloning techology creates consistent snapshots of all the disks in an ASM disk group then the diskgroup based on the copied LUNs can be mounted at an alternate site.

- EMC CLARiiON SnapView and MirrorView (also known as BCV copy)
- Hitachi Data Systems (HDS): Universal Storage Platform Replication Software
- IBM FlashCopy

Most common disk cloning technologies is mentioned in the following link. The documents specific for each vendor also explains the best practices when creating the disk copies.

虽然文档上没有对虚拟机的克隆技术进行描述，不过根据笔者过在虚拟机上搭建HA环境的经验，使用虚拟机的克隆技术往往会带来莫名其妙的错误，笔者在利用VMWARE的克隆技术搭建RAC时就报了ORA-15063。奉劝大家一句，如果要搭建HA环境，最好不要利用克隆技术，建个系统废不了多少工夫，不然到头来无穷无尽的bug会折磨得你想死，呵呵。

ASM磁盘头部信息被标记为“PROVISIONED”

查看v$asm_disk视图
SQL> select group_number gn,disk_number dn, mount_status, header_status,mode_status,state, total_mb, free_mb, label, path from v$asm_disk order by group_number, disk_number;
GN   DN MOUNT_STATUS HEADER_STATU MODE_STATUS STATE         TOTAL_MB    FREE_MB        LABEL          PATH
---- ---- ----------------------------------------------------------------------------------------------------------------------------------------
   0    0 CLOSED              PROVISIONED        ONLINE            NORMAL                  0                0       CRS             ORCL:CRS
   0    1 CLOSED              PROVISIONED        ONLINE            NORMAL                  0                0       DATA           ORCL:DATA
使用kfed查看发现kfdhdb.acdb.ub2spare的值为0xaa55( 2 bytes at 510th location ).
$ kfed read /dev/oracleasm/disks/DATA

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
..
kfdhdb.acdb.ents:                   0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare:     43605 ; 0x1de: 0xaa55

0xaa55 on little-endian server like Linux or 0x55aa on big-endain server like Sun Sparc indicates boot sinature ( or magic number ) on MBR (Master Boot Record ) Partition.

原因
系统或者人工从外部对ASM磁盘写入分区信息
解决方案

1、关闭ASM实例
2、备份ASM磁盘的头部信息

$ dd if=/dev/oracleasm/disks/DATA of=/tmp/DATA.dd bs=1M count=1

3. 对所有受影响磁盘执行下列操作

检查AU ( Allocation Unit )

$ kfed read /dev/oracleasm/disks/DATA | grep ausize
kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000

如果ASM的版本高于11.1.0.7

$ kfed repair /dev/oracleasm/disks/DATA aus=1048576

如果ASM的版本低于11.1.0.7

$ kfed read /dev/oracleasm/disks/DATA text=./DATA.kfed
$ kfed write /dev/oracleasm/disks/DATA CHKSUM=YES aus=1048576 text=./DATA.kfed

4、启动ASM实例，挂载受影响磁盘

5、该方法仅对磁盘头显示0xaa55时有效，如果显示其他信息或者依然没有解决ORA-15063请联系ORACLE。

ASM磁盘头部信息被标记为“UNKNOWN”

这种情况下系统层面上查看ASM磁盘是是正常的
$ /etc/init.d/oracleasm listdisks
DISCO01BKP
DISCO02BKP
DISCO03BKP
DISCO04BKP
DISCO05BKP
$ ls -l /dev/oracleasm/disks
brw-rw---- 1 oracle oinstall 8, 1 Dec 8 10:31 DISCO01BKP
brw-rw---- 1 oracle oinstall 8, 17 Dec 8 10:31 DISCO02BKP
brw-rw---- 1 oracle oinstall 8, 33 Dec 8 10:31 DISCO03BKP
brw-rw---- 1 oracle oinstall 8, 49 Dec 8 10:31 DISCO04BKP
brw-rw---- 1 oracle oinstall 8, 65 Dec 8 10:31 DISCO05BKP

查看/proc/partitions显示ASM磁盘连接着'sd'设备而不是''multipath (dm)'设备。
cat /proc/partitions
major minor #blocks name
8 1 209712478 sda1
8 17 209712478 sdb1
8 33 209712478 sdc1
8 49 209712478 sdd1
8 65 209712478 sde1
...
253 0 209715200 dm-0
253 1 209715200 dm-1
253 2 209715200 dm-2
253 3 209715200 dm-3
253 4 209715200 dm-4
解决方案
修改/etc/sysconfig/oracleasm配置是asm设备映射到多路径设备(multipath devices)
ORACLEASM_SCANORDER="mpath dm"
ORACLEASM_SCANEXCLUDE="sd"

ASM元数据损坏

解决方案

1、确认之前的备份是否可用

2、备份控制文件和归档日志文件
RMAN> backup device type disk format '/u03/backup/%U' database plus archivelog;
RMAN> backup device type disk format '/u03/backup/ctrlf_%U' current controlfile;
3、手工备份参数文件

CREATE PFILE='/u03/app/oracle/product/10.1.0/dbs/init<sid>.ora'
FROM SPFILE='/+DATA/V10FJ/spfile.ora';

2、一致性关闭ASM实例和数据库实例

3、重建ASM磁盘组
setenv ORACLE_SID +ASM
sqlplus '/ as sysdba'
SQL> startup nomount
SQL> create diskgroup data disk '/dev/rdsk/c1t4d0s4' force;
SQL> shutdown immediate
SQL> startup mount
4、还原数据库
setenv ORACLE_SID DBSCOTT
sqlplus '/ as sysdba'
SQL> startup nomount pfile=init<sid>.ora
rman target /
RMAN> restore controlfile from '/u03/backup/ctrlf_<string>'; -- where <string> is the unique string generated by %U.
RMAN> alter database mount;
RMAN> restore database;
RMAN> recover database;
RMAN> alter database open resetlogs;
5、连接到ASM实例查看控制文件路径
setenv ORACLE_SID +ASM
sqlplus '/ as sysdba'
SQL> select name, alias_directory from v$asm_alias;
6、编辑init<sid>.ora将control_files参数执行v$asm_alias查看到的值
7、重新创建spfile并重新启动数据库。注意由于ASM实例和数据库实例是一对多的关系，需要对所有受影响的数据库都进行相同操作。

参考至：http://blog.csdn.net/linwaterbin/article/details/8151309
               http://www.boobooke.com/bbs/viewthread.php?tid=101936
               http://blog.csdn.net/tianlesoftware/article/details/6274852
               ASM Diskgroup Failed to Mount On Second Node ORA-15063 [ID 731075.1]
               ORA-15063 When Mounting a Diskgroup After Storage Cloning ( BCV / Split Mirror / SRDF / HDS / Flash Copy ) [ID 784776.1]
               Mounting Diskgroup Fails With ORA-15063 and V$ASM_DISK Shows PROVISIONED [ID 1487443.1]
               Mount ASM Disk Group Fails : ORA-15186, ORA-15025, ORA-15063 [ID 1384504.1]
               Receiving ORA-15032 ORA-15063 When Mounting Diskgroup in 11.1.0/7 [ID 1437578.1]
本文原创，转载请注明出处、作者
如有错误，欢迎指正
邮箱:czmcj@163.com

0
顶

2
踩

分享到：

利用openfiler实现iSCSI(原创) | libpthread.so.0: cannot open shared obje ...

2012-11-25 16:59
浏览 13135
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论