通过ADDM嗅到存储硬盘故障-阿里云开发者社区

通过ADDM嗅到存储硬盘故障

2017-11-07 1227

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

今天ADDM巡检发现出现问题：Finding The throughput of the I/O subsystem was significantly lower than expected

该问题从来未出现过，立即引起笔者的警觉，展开如下相关项发现多个裸设备同时出现IO异常的告警，而按笔者所在的业务系统，该时段显然未进入一天的业务最高锋，而这个问题是以往哪怕是节前最高峰也从未出现的。马上要求系统工程师确认存储子系统有无问题，答复是“远程管理口未接上”。当天下班后笔者强烈的直觉感觉到可能存在存储异常状况，决定前往IDC机房巡检查看存储系统。到IDC居然发现由于临时太急，存储的钥匙也未带上，后通过存储柜门的小孔透视发现一块磁盘亮黄灯。于是立即向系统工程师反馈这一故障，当然我们的存储由于RAID+HOTSPARE结构,即使坏两块盘也不丢数据。

最后分析应该是该块磁盘故障导致IO临时异常，提醒大家,ADDM中观测到大量的裸设备或文件系统异常时一定要关注磁盘有无异常状况。

后续改进措施:要求存储系统接上远程管理口，便于远程检查，以笔者所在机房为例，打车28元，时间至少半个小时以上，如果有远程管理口，这部分时间和金钱显然可以省下来

Finding The throughput of the I/O subsystem was significantly lower than expected.
Impact (minutes) 32.2
Impact (%) 27.5
Recommendations
Show All Details | Hide All Details
Details Category Benefit (%)
Hide Host Configuration 27.5
Action Consider increasing the throughput of the I/O subsystem. Oracle's recommended solution is to stripe all data file using the SAME methodology. You might also need to increase the number of disks for better performance. Alternatively, consider using Oracle's Automatic Storage Management solution.
Rationale During the analysis period, the average data files' I/O throughput was 898 K per second for reads and 40 K per second for writes. The average response time for single block reads was 19 milliseconds.
Hide Host Configuration 24.2
Action The performance of file /dev/rgaza_disk was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks.
Rationale The average response time for single block reads for this file was 112 milliseconds.
Hide Host Configuration 1
Action The performance of file /dev/rsystem_disk was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks.
Rationale The average response time for single block reads for this file was 206 milliseconds.
Hide Host Configuration 0.8
Action The performance of file /dev/rdata35_disk was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks.
Rationale The average response time for single block reads for this file was 527 milliseconds.
Hide Host Configuration 0.6
Action The performance of file /dev/rtemp1_disk was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks.
Rationale The average response time for single block reads for this file was 34 milliseconds.
Findings Path

本文转自zylhsy 51CTO博客，原文链接：http://blog.51cto.com/yunlongzheng/933002，如需转载请自行联系原作者

文章标签：

存储

通过ADDM嗅到存储硬盘故障

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

通过ADDM嗅到存储硬盘故障

热门文章

最新文章

相关电子书