ASM REACTING TO PARTITION ERRORS [ID 1062954.1]

简介:   ASM REACTING TO PARTITION ERRORS [ID 1062954.1] ----------------------------------------------------------------------...

 

ASM REACTING TO PARTITION ERRORS [ID 1062954.1]

--------------------------------------------------------------------------------
 
  修改时间 10-AUG-2011     类型 PROBLEM     状态 PUBLISHED  

In this Document
  Symptoms
  Cause
  Solution

 

--------------------------------------------------------------------------------

 

Applies to:
Oracle Server - Enterprise Edition - Version: 10.1.0.4 to 11.2.0.1.0 - Release: 10.1 to 11.2
Linux x86
Haansoft Linux x86-64

Symptoms
Randomly disks that belonged to ASM disk groups show as PROVISIONED or at times as CANDIDATE in v$asm_disk.header_status. Upon dismount, disk groups with those disks will not mount.

From ASM alert log:

ERROR: diskgroup was not mounted
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup ""

Or

ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "disk number here>" is missing
ORA-15063: ASM discovered an insufficient number of disks for diskgroup ""

This seems to occur when new LUNs are either added or configured on the cluster, but this behavior has occurred several times (10+), on more than one cluster and across separate data centers.

At times the disks' v$asm_header_status is of member but still the disk groups will not mount, upon attempt to re-mount the disk group.

While troubleshooting the issue, it has been noticed that the OS partition table for the devices employed by the ASM disks, is wiped out (does not exist). This issue is similarly reproduced when dd is used to wiped the devices although this does not explain why some times the disks will show with v$asm_disk.header_status=member, and still cannot be mountable.

Cause
It turns out that the inq.Linux command is incorrectly writing to /dev/sd<x><y> device, which is wiping out the partition table. Depending on which mpath device /dev/sd<x><y> is part of, where once notices the corruption.

This is caused by EMC bug which has older version (prior to versions 6.3.0.0-771) of the Linux inq utility/command. The eNav utility calls inq.Linux.

Details from EMC bug:
1) older versions of this command scanned all devices in /dev, not just scsi disks, and so included /dev/kmsg
2) older versions of this command incorrectly matched /dev/kmsg and /dev/sd<x><y> thinking it was multiple paths to the same device, when it is not.
3) older versions of this command allocated a 216 byte inquiry buffer. This was apparently sufficient for EMC devices, but was too small for certain other disks. The scsi layer would return an error if the buffer is undersized.

The above 3 conditions basically caused the errors to erroneously get routed to /dev/sd<x><y> instead of /dev/kmsg, which then wipes out the corresponding partition tables. All three conditions are fixed in versions of the INQ command after 6.3.0.0-771.

It is assumed that any installations with over 500 scsi disks attached (includes multiple paths to the same disk via multipathing, etc....so in that case, only 250 or so LUNs if the environment has two paths per LUN, minus the number of locally attached disks) which would cause /dev/sd to exist and that were running the eNav utility were at risk for similar corruption.


Note: Verified/confirmed by customer and sources outside Oracle (RedHat, EMC, Maryville) however no further details, like the EMC bug number or any other additional information, were furthermore provided.
Solution
An immediate work-around is to comment out the calling of the inq.Linux command to prevent it from happening across your environment. For this one has to contact either Maryville support (vendor of eNav) or EMC.

Another appropriate solution is to upgrade to a newer version of inq.Linux.

 

 

目录
相关文章
创建asm disk 磁盘组出错! ORA-15018 And ORA-15107 (Doc ID 2678808.1)
ORA-15018: diskgroup cannot be created ORA-15107: missing or invalid ASM disk name
134 0
|
6月前
Receiving ASM traces for errors: ORA-27090, LINUX-X86_64 ERROR: 17
Receiving ASM traces for errors: ORA-27090, LINUX-X86_64 ERROR:
44 7
|
6月前
|
SQL
How To Resize An ASM Disk? (Doc ID 470209.1)
1) Please backup the database(s) contained inside the associated diskgroup. 2) Then shutdown the databases contained inside the associated diskgroup. 3) Dismount the associated diskgroup to verify no client database connections are accessing this specific diskgroup: SQL> alter diskgroup <diskgro
48 2
|
Oracle 关系型数据库 Linux
ASMFD (ASM Filter Driver) Support on OS Platforms (Certification Matrix). (文档 ID 2034681.1)
1) Starting with Oracle Grid Infrastructure 12C Release 1 (12.1.0.2), Oracle ASM Filter Driver (Oracle ASMFD) is installed with an Oracle Grid Infrastructure installation.
2712 0
|
存储 Oracle 关系型数据库
【MOS】零宕机迁移ASM磁盘组到另一个SAN/磁盘阵列/DAS的准确步骤 (文档 ID 1946664.1)
【MOS】零宕机时间迁移 ASM 磁盘组到另一个 SAN/磁盘阵列/DAS 的准确步骤 (文档 ID 1946664.1) 文档内容 目标   提问,获得帮助,并分享您对于这篇文档的经验。
1189 0
|
存储 Oracle 关系型数据库
Oracle 11gR2 restart 单机使用asm存储 主机名发生更改处理过程 (文档 ID 986740.1)
How to Reconfigure Oracle Restart (文档 ID 986740.1) In this Document Goal ...
1056 0
|
Oracle 关系型数据库
How to Copy ASM Files Across Nodes [ID 1147859.1]
  How to Copy ASM Files Across Nodes [ID 1147859.1] --------------------------------------------------------------------...
1018 0
|
SQL Oracle 关系型数据库
How to move ASM spfile to a different disk group [ID 1082943.1]
  How to move ASM spfile to a different disk group [ID 1082943.1] Goal The goal is to move ASM spfile from one disk group to another.
1244 0