ASM REACTING TO PARTITION ERRORS [ID 1062954.1]-阿里云开发者社区

ASM REACTING TO PARTITION ERRORS [ID 1062954.1]

2012-04-18 991

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： ASM REACTING TO PARTITION ERRORS [ID 1062954.1] ----------------------------------------------------------------------...

ASM REACTING TO PARTITION ERRORS [ID 1062954.1]

--------------------------------------------------------------------------------

修改时间 10-AUG-2011 类型 PROBLEM 状态 PUBLISHED

In this Document
Symptoms
Cause
Solution

--------------------------------------------------------------------------------

Applies to:
Oracle Server - Enterprise Edition - Version: 10.1.0.4 to 11.2.0.1.0 - Release: 10.1 to 11.2
Linux x86
Haansoft Linux x86-64

Symptoms
Randomly disks that belonged to ASM disk groups show as PROVISIONED or at times as CANDIDATE in v$asm_disk.header_status. Upon dismount, disk groups with those disks will not mount.

From ASM alert log:

ERROR: diskgroup was not mounted
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup ""

ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "disk number here>" is missing
ORA-15063: ASM discovered an insufficient number of disks for diskgroup ""

This seems to occur when new LUNs are either added or configured on the cluster, but this behavior has occurred several times (10+), on more than one cluster and across separate data centers.

At times the disks' v$asm_header_status is of member but still the disk groups will not mount, upon attempt to re-mount the disk group.

While troubleshooting the issue, it has been noticed that the OS partition table for the devices employed by the ASM disks, is wiped out (does not exist). This issue is similarly reproduced when dd is used to wiped the devices although this does not explain why some times the disks will show with v$asm_disk.header_status=member, and still cannot be mountable.

Cause
It turns out that the inq.Linux command is incorrectly writing to /dev/sd<x><y> device, which is wiping out the partition table. Depending on which mpath device /dev/sd<x><y> is part of, where once notices the corruption.

This is caused by EMC bug which has older version (prior to versions 6.3.0.0-771) of the Linux inq utility/command. The eNav utility calls inq.Linux.

Details from EMC bug:
1) older versions of this command scanned all devices in /dev, not just scsi disks, and so included /dev/kmsg
2) older versions of this command incorrectly matched /dev/kmsg and /dev/sd<x><y> thinking it was multiple paths to the same device, when it is not.
3) older versions of this command allocated a 216 byte inquiry buffer. This was apparently sufficient for EMC devices, but was too small for certain other disks. The scsi layer would return an error if the buffer is undersized.

The above 3 conditions basically caused the errors to erroneously get routed to /dev/sd<x><y> instead of /dev/kmsg, which then wipes out the corresponding partition tables. All three conditions are fixed in versions of the INQ command after 6.3.0.0-771.

It is assumed that any installations with over 500 scsi disks attached (includes multiple paths to the same disk via multipathing, etc....so in that case, only 250 or so LUNs if the environment has two paths per LUN, minus the number of locally attached disks) which would cause /dev/sd to exist and that were running the eNav utility were at risk for similar corruption.

Note: Verified/confirmed by customer and sources outside Oracle (RedHat, EMC, Maryville) however no further details, like the EMC bug number or any other additional information, were furthermore provided.
Solution
An immediate work-around is to comment out the calling of the inq.Linux command to prevent it from happening across your environment. For this one has to contact either Maryville support (vendor of eNav) or EMC.

Another appropriate solution is to upgrade to a newer version of inq.Linux.

ASM REACTING TO PARTITION ERRORS [ID 1062954.1]

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

ASM REACTING TO PARTITION ERRORS [ID 1062954.1]

热门文章

最新文章

相关课程

相关电子书