LINUX系统磁盘FC-SAN ext3系统突然变位只读了-阿里云开发者社区

LINUX系统磁盘FC-SAN ext3系统突然变位只读了

2011-01-21 1006

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

函数计算FC，每月15万CU 3个月

简介： 日志如下： Jan 21 03:38:20 D0-LNXAPP03 kernel: SCSI error : return code = 0x20000Jan 21 03:38:20 D0-LNXAPP03 kernel: end_request: I/O ...

日志如下：

Jan 21 03:38:20 D0-LNXAPP03 kernel: SCSI error : return code = 0x20000
Jan 21 03:38:20 D0-LNXAPP03 kernel: end_request: I/O error, dev sda, sector 3301127367
Jan 21 03:38:20 D0-LNXAPP03 kernel: EXT3-fs error (device sda1): ext3_readdir: directory #206293234 contains a hole at offset 0
Jan 21 03:38:20 D0-LNXAPP03 kernel: Aborting journal on device sda1.
Jan 21 03:38:20 D0-LNXAPP03 kernel: ext3_abort called.
Jan 21 03:38:20 D0-LNXAPP03 kernel: EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Jan 21 03:38:20 D0-LNXAPP03 kernel: Remounting filesystem read-only
Jan 21 03:38:20 D0-LNXAPP03 kernel: EXT3-fs error (device sda1) in start_transaction: Journal has aborted

未解决

最后说是内核版本低了。重新格式化硬盘解决的

Why does the ext3 filesystems on my Storage Area Network (SAN) repeatedly become read-only?

by Chris Snook

When ext3 encounters possible corruption in filesystem metadata, it aborts the journal and remounts it as read-only to prevent causing damage to the metadata on disk. This can occur due to I/O errors while reading metadata, even if there is no metadata corruption on disk.

If filesystems on multiple disk arrays or accessed by multiple clients are repeatedly becoming read-only in a SAN environment, the most common cause is a SCSI timeout while the Fibre Channel HBA driver is handling an RSCN event on the Fibre Channel fabric.

An RSCN (Registered State Change Notification) is generated whenever the configuration of a Fibre Channel fabric changes, and is propagated to any HBA that shares a zone with the device that changed state. RSCNs may be generated when an HBA, switch, or LUN is added or removed, or when the zoning of the fabric is changed.

Resolution:

Some cases of this behavior. may be due to a known bug in the interaction between NFS and ext3. For this reason, it is recommended that users experiencing this problem on NFS servers update their kernel, at least to version 2.6.9-42.0.2.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199172

The lpfc driver update in Red Hat Enterprise Linux 4 Update 4 includes a change to RSCN handling which prevents this problem in many environments. Users of Emulex HBAs experiencing this problem are advised to update their kernel, at least to version 2.6.9-42.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179752

The lpfc and qla2xxx drivers also have configuration options which cause the driver to handle RSCNs in a less invasive manner, which often prevents timeouts during RSCN handling. These options must be set in the /etc/modprobe.conf file:

After making these changes, the initrd must be rebuilt and the system must be rebooted for the changes to take effect.

Recommendation:

This problem may be prevented or mitigated by applying SAN vendor recommended configurations and firmware updates to HBAs, switches, and disk arrays on the fabric, as well as recommended configurations and updates to multipathing software. This particularly applies to timeout and retry settings.

The architecture of Fibre Channel assumes that the fabric changes infrequently, so RSCNs can be disruptive even on properly configured fabrics. Events which generate RSCNs should be minimized, particularly at times of high activity, since this causes RSCN handling to take longer than it would on a mostly idle fabric.

In multipathed environments with separate fabrics for different paths, zone changes to the fabrics should be made far apart in time. It is not uncommon for complete handling of a zone change to take many minutes on a busy fabric with many systems and LUNs. Performing zone changes separately minimizes the risk of all paths timing out due to RSCN handling.

LINUX系统磁盘FC-SAN ext3系统突然变位只读了

Why does the ext3 filesystems on my Storage Area Network (SAN) repeatedly become read-only?

by Chris Snook

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

直播

下载

镜像站

技术资料

LINUX系统磁盘FC-SAN ext3系统突然变位只读了

Why does the ext3 filesystems on my Storage Area Network (SAN) repeatedly become read-only?

by Chris Snook

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像