近日遇到一个问题,ASM的磁盘组无法挂载,之前是正常的,由于一些其他的操作,数据库启动失败,当问题排除时候,发现在数据库整体启动时,挂载磁盘组的环节出现问题。
环境介绍
1
2
3
4
5
6
7
|
#########################################
硬件:Vmware ESX虚拟机
OS: Red hat linux 5
Oracle version: 11.2.0.2
ASM disk是通过 asmlib挂载的
这个磁盘组只有一个虚拟出的硬盘,是 /dev/sdb1.
#########################################
|
下面是我整个分析的过程
1. 首先通过ASM alert.log,发现如下错误,磁盘挂载失败,无法找到磁盘组
1
|
SQL>
alter
diskgroup DATA mount NOTE: cache registered
group
DATA number=1 incarn=0xc28a1e2d NOTE: cache began mount (
first
)
of
group
DATA number=1 incarn=0xc28a1e2d Tue
Dec
11 18:06:55 2012 ERROR:
no
PST quorum
in
group
: required 2, found 0 <<<<<<<<<<< NOTE: cache dismounting (clean)
group
1/0xC28A1E2D (DATA) NOTE: dbwr
not
being msg
'd to dismount NOTE: lgwr not being msg'
d
to
dismount NOTE: cache dismounted
group
1/0xC28A1E2D (DATA) NOTE: cache ending mount (fail)
of
group
DATA number=1 incarn=0xc28a1e2d NOTE: cache deleting context
for
group
DATA 1/0xc28a1e2d GMON dismounting
group
1
at
8
for
pid 17, osid 32163 ERROR: diskgroup DATA was
not
mounted ORA-15032:
not
all
alterations performed ORA-15017: diskgroup
"DATA"
cannot be mounted ORA-15063: ASM discovered an insufficient number
of
disks
for
diskgroup
"DATA"
ERROR:
alter
diskgroup DATA mount
|
2. 首先检查ASM pfile 文件,未发现异常
asm_diskgroups='DATA' instance_type='asm' large_pool_size=12M remote_login_passwordfile='EXCLUSIVE'
3. 尝试通过以下命令检查磁盘是否物理存在,是如何对应物理设备的,发现查询不到ASM磁盘
[grid@lgto_test ~]$ kfod disks=all ----Non output---- [grid@lgto_test peer]$ cd /dev/oracleasm/disks/ [grid@lgto_test disks]$ ls ----Non output---- [grid@lgto_test disks]$ /etc/init.d/oracleasm listdisks ----Non output----
4. 但是直接检查物理设备,/dev/sdb1是存在的,说明OS已经识别该硬盘设备,只是ASMlib无法正常识别:
查询对应的物理硬盘[oracle@OEL ~]$ /etc/init.d/oracleasm querydisk -d disk1 Disk "DISK1" is a valid ASM disk on device [8,17] [oracle@OEL ~]$ ls -l /dev/ |grep 8|grep 17 brw-r----- 1 root disk 8, 17 Oct 16 14:01 sdb1 [root@lgto_test ~]# ls -lst /dev/sd* 0 brw-r----- 1 root disk 8, 0 Dec 11 19:29 /dev/sda 0 brw-r----- 1 root disk 8, 2 Dec 11 19:29 /dev/sda2 0 brw-r----- 1 root disk 8, 16 Dec 11 19:29 /dev/sdb 0 brw-r----- 1 root disk 8, 17 Dec 11 19:29 /dev/sdb1 <<<<<<<This is the missed diskgroup 0 brw-r----- 1 root disk 8, 1 Dec 11 11:29 /dev/sda1
5. 起先是考虑是否是磁盘头损害,导致无法ASMlib识别该磁盘, dump磁盘头发现没有问题.
-
#od -c /dev/sdb1
-
……
-
0000040 O R C L D I S K D A T A D G 0 1
-
……
-
7760040 O R C L D I S K D A T A D G 0 1
这里补充下,如果磁盘头信息丢失,将会显示如下
0000040 O R C L D I S K \0 \0 \0 \0 \0 \0 \0 \0
如果显示这个结果,需要通过以下方式renamedisk,具体可以参考文档Oracleasm Listdisks Cannot See Disks (Doc ID 392527.1)
Use the "oracleasm renamedisk" utility to add an asmlib label to the disk: /etc/init.d/oracleasm renamedisk /dev/<device> <asmlib_label> If it fails, use the "-f" switch: /etc/init.d/oracleasm renamedisk -f /dev/<device> <asmlib_label>
6. 重启ASMLib ,检查是否是ASMLib 问题
[root@lgto_test ~]# /etc/init.d/oracleasm restart Dropping Oracle ASMLib disks: [ OK ] Shutting down the Oracle ASMLib driver: [FAILED]
检查文件系统oracleasm文件系统已经成功挂载
[root@lgto_test ~]# df -ha Filesystem Size Used Avail Use% Mounted on …… oracleasmfs 0 0 0 - /dev/oracleasm
7. 检查 /dev/sdb1状态,查看是否已经marked为ASM disk,显示已经标记成功
[root@lgto_test ~]# oracleasm querydisk /dev/sdb1 Device "/dev/sdb1" is marked an ASM disk with the label "DATADG01" [root@lgto_test ~]# /sbin/service oracleasm scandisks Scanning the system for Oracle ASMLib disks: [ OK ] [root@lgto_test ~]# /etc/init.d/oracleasm listdisks ----Non output---
8. 检查 rpm package也没有问题
[grid@lgto_test ~]$ rpm -qa|grep oracleasm oracleasmlib-2.0.4-1.el5 oracleasm-support-2.1.7-1.el5 oracleasm-2.6.18-308.el5-2.0.5-1.el5
9. 收集 Kfed logs,没有检查到异常新信息。
[root@lgto_test ~]# /oracle/ora11g/product/app/grid/bin/kfed read /dev/sdb1 kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0 kfbh.check: 3351358462 ; 0x00c: 0xc7c1abfe kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdhdb.driver.provstr: ORCLDISKDATADG01 ; 0x000: length=16 kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144 kfdhdb.driver.reserved[1]: 825247556 ; 0x00c: 0x31304744 kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000 kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000 kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000 kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000 kfdhdb.compat: 186646528 ; 0x020: 0x0b200000 kfdhdb.dsknum: 0 ; 0x024: 0x0000 kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname: DATADG01 ; 0x028: length=8 kfdhdb.grpname: DATA ; 0x048: length=4 kfdhdb.fgname: DATADG01 ; 0x068: length=8 kfdhdb.capname: ; 0x088: length=0 kfdhdb.crestmp.hi: 32977140 ; 0x0a8: HOUR=0x14 DAYS=0x7 MNTH=0xc YEAR=0x7dc kfdhdb.crestmp.lo: 1642529792 ; 0x0ac: USEC=0x0 MSEC=0x1c1 SECS=0x1e MINS=0x18 kfdhdb.mntstmp.hi: 32977140 ; 0x0b0: HOUR=0x14 DAYS=0x7 MNTH=0xc YEAR=0x7dc kfdhdb.mntstmp.lo: 1664549888 ; 0x0b4: USEC=0x0 MSEC=0x1c1 SECS=0x33 MINS=0x18 kfdhdb.secsize: 512 ; 0x0b8: 0x0200 kfdhdb.blksize: 4096 ; 0x0ba: 0x1000 kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000 kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80 kfdhdb.dsksize: 204797 ; 0x0c4: 0x00031ffd kfdhdb.pmcnt: 3 ; 0x0c8: 0x00000003 kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001 kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002 kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002 kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000 kfdhdb.redomirrors[1]: 0 ; 0x0da: 0x0000 kfdhdb.redomirrors[2]: 0 ; 0x0dc: 0x0000 kfdhdb.redomirrors[3]: 0 ; 0x0de: 0x0000 kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000 kfdhdb.grpstmp.hi: 32977140 ; 0x0e4: HOUR=0x14 DAYS=0x7 MNTH=0xc YEAR=0x7dc kfdhdb.grpstmp.lo: 1642390528 ; 0x0e8: USEC=0x0 MSEC=0x139 SECS=0x1e MINS=0x18 kfdhdb.vfstart: 0 ; 0x0ec: 0x00000000 kfdhdb.vfend: 0 ; 0x0f0: 0x00000000 kfdhdb.spfile: 58 ; 0x0f4: 0x0000003a kfdhdb.spfflg: 1 ; 0x0f8: 0x00000001 kfdhdb.ub4spare[0]: 0 ; 0x0fc: 0x00000000 kfdhdb.ub4spare[1]: 0 ; 0x100: 0x00000000 kfdhdb.ub4spare[2]: 0 ; 0x104: 0x00000000 kfdhdb.ub4spare[3]: 0 ; 0x108: 0x00000000 kfdhdb.ub4spare[4]: 0 ; 0x10c: 0x00000000 kfdhdb.ub4spare[5]: 0 ; 0x110: 0x00000000 kfdhdb.ub4spare[6]: 0 ; 0x114: 0x00000000 kfdhdb.ub4spare[7]: 0 ; 0x118: 0x00000000 kfdhdb.ub4spare[8]: 0 ; 0x11c: 0x00000000 kfdhdb.ub4spare[9]: 0 ; 0x120: 0x00000000 kfdhdb.ub4spare[10]: 0 ; 0x124: 0x00000000 kfdhdb.ub4spare[11]: 0 ; 0x128: 0x00000000 kfdhdb.ub4spare[12]: 0 ; 0x12c: 0x00000000 kfdhdb.ub4spare[13]: 0 ; 0x130: 0x00000000 kfdhdb.ub4spare[14]: 0 ; 0x134: 0x00000000 kfdhdb.ub4spare[15]: 0 ; 0x138: 0x00000000 kfdhdb.ub4spare[16]: 0 ; 0x13c: 0x00000000 kfdhdb.ub4spare[17]: 0 ; 0x140: 0x00000000 kfdhdb.ub4spare[18]: 0 ; 0x144: 0x00000000 kfdhdb.ub4spare[19]: 0 ; 0x148: 0x00000000 kfdhdb.ub4spare[20]: 0 ; 0x14c: 0x00000000 kfdhdb.ub4spare[21]: 0 ; 0x150: 0x00000000 kfdhdb.ub4spare[22]: 0 ; 0x154: 0x00000000 kfdhdb.ub4spare[23]: 0 ; 0x158: 0x00000000 kfdhdb.ub4spare[24]: 0 ; 0x15c: 0x00000000 kfdhdb.ub4spare[25]: 0 ; 0x160: 0x00000000 kfdhdb.ub4spare[26]: 0 ; 0x164: 0x00000000 kfdhdb.ub4spare[27]: 0 ; 0x168: 0x00000000 kfdhdb.ub4spare[28]: 0 ; 0x16c: 0x00000000 kfdhdb.ub4spare[29]: 0 ; 0x170: 0x00000000 kfdhdb.ub4spare[30]: 0 ; 0x174: 0x00000000 kfdhdb.ub4spare[31]: 0 ; 0x178: 0x00000000 kfdhdb.ub4spare[32]: 0 ; 0x17c: 0x00000000 kfdhdb.ub4spare[33]: 0 ; 0x180: 0x00000000 kfdhdb.ub4spare[34]: 0 ; 0x184: 0x00000000 kfdhdb.ub4spare[35]: 0 ; 0x188: 0x00000000 kfdhdb.ub4spare[36]: 0 ; 0x18c: 0x00000000 kfdhdb.ub4spare[37]: 0 ; 0x190: 0x00000000 kfdhdb.ub4spare[38]: 0 ; 0x194: 0x00000000 kfdhdb.ub4spare[39]: 0 ; 0x198: 0x00000000 kfdhdb.ub4spare[40]: 0 ; 0x19c: 0x00000000 kfdhdb.ub4spare[41]: 0 ; 0x1a0: 0x00000000 kfdhdb.ub4spare[42]: 0 ; 0x1a4: 0x00000000 kfdhdb.ub4spare[43]: 0 ; 0x1a8: 0x00000000 kfdhdb.ub4spare[44]: 0 ; 0x1ac: 0x00000000 kfdhdb.ub4spare[45]: 0 ; 0x1b0: 0x00000000 kfdhdb.ub4spare[46]: 0 ; 0x1b4: 0x00000000 kfdhdb.ub4spare[47]: 0 ; 0x1b8: 0x00000000 kfdhdb.ub4spare[48]: 0 ; 0x1bc: 0x00000000 kfdhdb.ub4spare[49]: 0 ; 0x1c0: 0x00000000 kfdhdb.ub4spare[50]: 0 ; 0x1c4: 0x00000000 kfdhdb.ub4spare[51]: 0 ; 0x1c8: 0x00000000 kfdhdb.ub4spare[52]: 0 ; 0x1cc: 0x00000000 kfdhdb.ub4spare[53]: 0 ; 0x1d0: 0x00000000 kfdhdb.acdb.aba.seq: 0 ; 0x1d4: 0x00000000 kfdhdb.acdb.aba.blk: 0 ; 0x1d8: 0x00000000 kfdhdb.acdb.ents: 0 ; 0x1dc: 0x0000 kfdhdb.acdb.ub2spare: 0 ; 0x1de: 0x0000
10. 阶段总结,通过以上的分析,得出以下总结
1. ASMLib 正常
2. RPM包正常
3. 磁盘头没有损坏和丢失信息
4. 该硬件已经被系统正常识别
当前问题就是为什么ASMLib 不能正常扫描并识别到该硬盘
11. 最后在检查文件 /etc/sysconfig/oracleasm时,发现问题,我们需要扫描到的磁盘是/dev/sdb1,可是在这个配置文件中却排除扫描sdb*的磁盘,和我们希望的是相悖的,将ORACLEASM_SCANEXCLUDE="" 设为空,并重启ASMLib,最后问题解决。
[grid@lgto_test disks]$ more /etc/sysconfig/oracleasm # ORACLEASM_ENABELED: 'true' means to load the driver on boot. ORACLEASM_ENABLED=true # ORACLEASM_UID: Default user owning the /dev/oracleasm mount point. ORACLEASM_UID=grid # ORACLEASM_GID: Default group owning the /dev/oracleasm mount point. ORACLEASM_GID=asmadmin # ORACLEASM_SCANBOOT: 'true' means scan for ASM disks on boot. ORACLEASM_SCANBOOT=true # ORACLEASM_SCANORDER: Matching patterns to order disk scanning ORACLEASM_SCANORDER="mapper mpath" # ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan ORACLEASM_SCANEXCLUDE="sdb" <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
12. 重启ASMLib并确认磁盘状态
# /etc/init.d/oracleasm restart # /sbin/service oracleasm scandisks