LINUX系统磁盘FC-SAN ext3系统突然变位只读了

本文涉及的产品
简介: 日志如下: Jan 21 03:38:20 D0-LNXAPP03 kernel: SCSI error : return code = 0x20000Jan 21 03:38:20 D0-LNXAPP03 kernel: end_request: I/O ...

日志如下:

Jan 21 03:38:20 D0-LNXAPP03 kernel: SCSI error : return code = 0x20000
Jan 21 03:38:20 D0-LNXAPP03 kernel: end_request: I/O error, dev sda, sector 3301127367
Jan 21 03:38:20 D0-LNXAPP03 kernel:
EXT3-fs error (device sda1): ext3_readdir: directory #206293234 contains a hole at offset 0
Jan 21 03:38:20 D0-LNXAPP03 kernel: Aborting journal on device sda1.
Jan 21 03:38:20 D0-LNXAPP03 kernel: ext3_abort called.
Jan 21 03:38:20 D0-LNXAPP03 kernel: EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Jan 21 03:38:20 D0-LNXAPP03 kernel: Remounting filesystem read-only
Jan 21 03:38:20 D0-LNXAPP03 kernel: EXT3-fs error (device sda1) in start_transaction: Journal has aborted

 
未解决
 
最后说是内核版本低了。重新格式化硬盘解决的

Why does the ext3 filesystems on my Storage Area Network (SAN) repeatedly become read-only?

by Chris Snook

When ext3 encounters possible corruption in filesystem metadata, it aborts the journal and remounts it as read-only to prevent causing damage to the metadata on disk. This can occur due to I/O errors while reading metadata, even if there is no metadata corruption on disk.

If filesystems on multiple disk arrays or accessed by multiple clients are repeatedly becoming read-only in a SAN environment, the most common cause is a SCSI timeout while the Fibre Channel HBA driver is handling an RSCN event on the Fibre Channel fabric.

An RSCN (Registered State Change Notification) is generated whenever the configuration of a Fibre Channel fabric changes, and is propagated to any HBA that shares a zone with the device that changed state. RSCNs may be generated when an HBA, switch, or LUN is added or removed, or when the zoning of the fabric is changed.

Resolution:

Some cases of this behavior. may be due to a known bug in the interaction between NFS and ext3. For this reason, it is recommended that users experiencing this problem on NFS servers update their kernel, at least to version 2.6.9-42.0.2.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199172

The lpfc driver update in Red Hat Enterprise Linux 4 Update 4 includes a change to RSCN handling which prevents this problem in many environments. Users of Emulex HBAs experiencing this problem are advised to update their kernel, at least to version 2.6.9-42.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179752

The lpfc and qla2xxx drivers also have configuration options which cause the driver to handle RSCNs in a less invasive manner, which often prevents timeouts during RSCN handling. These options must be set in the /etc/modprobe.conf file:

 
 

After making these changes, the initrd must be rebuilt and the system must be rebooted for the changes to take effect.

Recommendation:

This problem may be prevented or mitigated by applying SAN vendor recommended configurations and firmware updates to HBAs, switches, and disk arrays on the fabric, as well as recommended configurations and updates to multipathing software. This particularly applies to timeout and retry settings.

The architecture of Fibre Channel assumes that the fabric changes infrequently, so RSCNs can be disruptive even on properly configured fabrics. Events which generate RSCNs should be minimized, particularly at times of high activity, since this causes RSCN handling to take longer than it would on a mostly idle fabric.

In multipathed environments with separate fabrics for different paths, zone changes to the fabrics should be made far apart in time. It is not uncommon for complete handling of a zone change to take many minutes on a busy fabric with many systems and LUNs. Performing zone changes separately minimizes the risk of all paths timing out due to RSCN handling.

相关实践学习
基于函数计算一键部署掌上游戏机
本场景介绍如何使用阿里云计算服务命令快速搭建一个掌上游戏机。
建立 Serverless 思维
本课程包括: Serverless 应用引擎的概念, 为开发者带来的实际价值, 以及让您了解常见的 Serverless 架构模式
相关文章
|
1天前
|
Ubuntu Linux
Linux(Ubuntu)系统临时IP以及静态IP配置(关闭、启动网卡等操作)
请注意,以上步骤是在临时基础上进行配置的。如果要永久保存静态IP地址,通常还需要修改 `/etc/network/interfaces`文件,以便在系统重启后保持配置。同时,确保备份相关配置文件以防止出现问题。
13 1
|
2天前
|
Linux 数据安全/隐私保护
Linux系统忘记密码的三种解决办法
这篇博客介绍了三种在Linux忘记密码时重置登录密码的方法:1) 使用恢复模式,通过控制台界面以管理员权限更改密码;2) 利用Linux Live CD/USB启动,挂载硬盘分区并使用终端更改密码;3) 进入单用户模式,自动以管理员身份登录后重置密码。每个方法都提供了详细步骤,提醒用户在操作前备份重要数据。
|
2天前
|
JSON Unix Linux
Linux系统之jq工具的基本使用
Linux系统之jq工具的基本使用
30 2
|
2天前
|
数据采集 监控 安全
linux系统被×××后处理经历
linux系统被×××后处理经历
|
3天前
|
监控 安全 Linux
Linux系统之安装ServerBee服务器监控工具
【4月更文挑战第22天】Linux系统之安装ServerBee服务器监控工具
41 2
|
3天前
|
缓存 Linux
linux系统缓存机制
linux系统缓存机制
|
3天前
|
存储 Linux Android开发
RK3568 Android/Linux 系统动态更换 U-Boot/Kernel Logo
RK3568 Android/Linux 系统动态更换 U-Boot/Kernel Logo
18 0
|
2月前
|
人工智能 数据管理 Serverless
阿里云数据库走向Serverless与AI驱动的一站式数据平台具有重大意义和潜力
阿里云数据库走向Serverless与AI驱动的一站式数据平台具有重大意义和潜力
407 2
|
2月前
|
人工智能 运维 Cloud Native
、你如何看待阿里云数据库走向Serverless与AI驱动的一站式数据平台?
、你如何看待阿里云数据库走向Serverless与AI驱动的一站式数据平台?
149 2
|
2月前
|
人工智能 数据管理 大数据
阿里云数据库走向Serverless与AI驱动的一站式数据平台是一个很有前景和意义的发展方向
阿里云数据库走向Serverless与AI驱动的一站式数据平台是一个很有前景和意义的发展方向
35 2