LINUX系统磁盘FC-SAN ext3系统突然变位只读了

本文涉及的产品
函数计算FC,每月15万CU 3个月
简介: 日志如下: Jan 21 03:38:20 D0-LNXAPP03 kernel: SCSI error : return code = 0x20000Jan 21 03:38:20 D0-LNXAPP03 kernel: end_request: I/O ...

日志如下:

Jan 21 03:38:20 D0-LNXAPP03 kernel: SCSI error : return code = 0x20000
Jan 21 03:38:20 D0-LNXAPP03 kernel: end_request: I/O error, dev sda, sector 3301127367
Jan 21 03:38:20 D0-LNXAPP03 kernel:
EXT3-fs error (device sda1): ext3_readdir: directory #206293234 contains a hole at offset 0
Jan 21 03:38:20 D0-LNXAPP03 kernel: Aborting journal on device sda1.
Jan 21 03:38:20 D0-LNXAPP03 kernel: ext3_abort called.
Jan 21 03:38:20 D0-LNXAPP03 kernel: EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Jan 21 03:38:20 D0-LNXAPP03 kernel: Remounting filesystem read-only
Jan 21 03:38:20 D0-LNXAPP03 kernel: EXT3-fs error (device sda1) in start_transaction: Journal has aborted

 
未解决
 
最后说是内核版本低了。重新格式化硬盘解决的

Why does the ext3 filesystems on my Storage Area Network (SAN) repeatedly become read-only?

by Chris Snook

When ext3 encounters possible corruption in filesystem metadata, it aborts the journal and remounts it as read-only to prevent causing damage to the metadata on disk. This can occur due to I/O errors while reading metadata, even if there is no metadata corruption on disk.

If filesystems on multiple disk arrays or accessed by multiple clients are repeatedly becoming read-only in a SAN environment, the most common cause is a SCSI timeout while the Fibre Channel HBA driver is handling an RSCN event on the Fibre Channel fabric.

An RSCN (Registered State Change Notification) is generated whenever the configuration of a Fibre Channel fabric changes, and is propagated to any HBA that shares a zone with the device that changed state. RSCNs may be generated when an HBA, switch, or LUN is added or removed, or when the zoning of the fabric is changed.

Resolution:

Some cases of this behavior. may be due to a known bug in the interaction between NFS and ext3. For this reason, it is recommended that users experiencing this problem on NFS servers update their kernel, at least to version 2.6.9-42.0.2.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199172

The lpfc driver update in Red Hat Enterprise Linux 4 Update 4 includes a change to RSCN handling which prevents this problem in many environments. Users of Emulex HBAs experiencing this problem are advised to update their kernel, at least to version 2.6.9-42.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179752

The lpfc and qla2xxx drivers also have configuration options which cause the driver to handle RSCNs in a less invasive manner, which often prevents timeouts during RSCN handling. These options must be set in the /etc/modprobe.conf file:

 
 

After making these changes, the initrd must be rebuilt and the system must be rebooted for the changes to take effect.

Recommendation:

This problem may be prevented or mitigated by applying SAN vendor recommended configurations and firmware updates to HBAs, switches, and disk arrays on the fabric, as well as recommended configurations and updates to multipathing software. This particularly applies to timeout and retry settings.

The architecture of Fibre Channel assumes that the fabric changes infrequently, so RSCNs can be disruptive even on properly configured fabrics. Events which generate RSCNs should be minimized, particularly at times of high activity, since this causes RSCN handling to take longer than it would on a mostly idle fabric.

In multipathed environments with separate fabrics for different paths, zone changes to the fabrics should be made far apart in time. It is not uncommon for complete handling of a zone change to take many minutes on a busy fabric with many systems and LUNs. Performing zone changes separately minimizes the risk of all paths timing out due to RSCN handling.

相关实践学习
【文生图】一键部署Stable Diffusion基于函数计算
本实验教你如何在函数计算FC上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。函数计算提供一定的免费额度供用户使用。本实验答疑钉钉群:29290019867
建立 Serverless 思维
本课程包括: Serverless 应用引擎的概念, 为开发者带来的实际价值, 以及让您了解常见的 Serverless 架构模式
相关文章
|
1月前
|
存储 缓存 监控
Linux缓存管理:如何安全地清理系统缓存
在Linux系统中,内存管理至关重要。本文详细介绍了如何安全地清理系统缓存,特别是通过使用`/proc/sys/vm/drop_caches`接口。内容包括清理缓存的原因、步骤、注意事项和最佳实践,帮助你在必要时优化系统性能。
208 78
|
6天前
|
Ubuntu Linux 网络安全
Linux磁盘挂接教程
Linux磁盘挂接教程
37 14
|
15天前
|
缓存 安全 Linux
Linux系统查看操作系统版本信息、CPU信息、模块信息
在Linux系统中,常用命令可帮助用户查看操作系统版本、CPU信息和模块信息
71 23
|
15天前
|
人工智能 JSON 自然语言处理
一键生成毛茸萌宠形象,基于函数计算极速部署 ComfyUI 生图系统
本次方案将帮助大家实现使用阿里云产品函数计算FC,只需简单操作,就可以快速配置ComfyUI大模型,创建出你的专属毛茸茸萌宠形象。内置基础大模型+常用插件+部分 Lora,以风格化图像生成只需用户让体验键配置简单方便,后续您可以根据自己的需要更换需要的模型、Lora、增加插件。
|
1月前
|
JSON 人工智能 Serverless
一键生成毛茸萌宠形象,基于函数计算极速部署ComfyUI生图系统
通过阿里云函数计算FC 和文件存储NAS,用户体验 ComfyUI 和预置工作流文件,用户可以快速生成毛茸茸萌宠等高质量图像。
一键生成毛茸萌宠形象,基于函数计算极速部署ComfyUI生图系统
|
1月前
|
Linux Shell 网络安全
Kali Linux系统Metasploit框架利用 HTA 文件进行渗透测试实验
本指南介绍如何利用 HTA 文件和 Metasploit 框架进行渗透测试。通过创建反向 shell、生成 HTA 文件、设置 HTTP 服务器和发送文件,最终实现对目标系统的控制。适用于教育目的,需合法授权。
78 9
Kali Linux系统Metasploit框架利用 HTA 文件进行渗透测试实验
|
1月前
|
存储 监控 Linux
嵌入式Linux系统编程 — 5.3 times、clock函数获取进程时间
在嵌入式Linux系统编程中,`times`和 `clock`函数是获取进程时间的两个重要工具。`times`函数提供了更详细的进程和子进程时间信息,而 `clock`函数则提供了更简单的处理器时间获取方法。根据具体需求选择合适的函数,可以更有效地进行性能分析和资源管理。通过本文的介绍,希望能帮助您更好地理解和使用这两个函数,提高嵌入式系统编程的效率和效果。
110 13
|
1月前
|
Ubuntu Linux C++
Win10系统上直接使用linux子系统教程(仅需五步!超简单,快速上手)
本文介绍了如何在Windows 10上安装并使用Linux子系统。首先,通过应用商店安装Windows Terminal和Linux系统(如Ubuntu)。接着,在控制面板中启用“适用于Linux的Windows子系统”并重启电脑。最后,在Windows Terminal中选择安装的Linux系统即可开始使用。文中还提供了注意事项和进一步配置的链接。
49 0
|
1月前
|
存储 Oracle 安全
服务器数据恢复—LINUX系统删除/格式化的数据恢复流程
Linux操作系统是世界上流行的操作系统之一,被广泛用于服务器、个人电脑、移动设备和嵌入式系统。Linux系统下数据被误删除或者误格式化的问题非常普遍。下面北亚企安数据恢复工程师简单聊一下基于linux的文件系统(EXT2/EXT3/EXT4/Reiserfs/Xfs) 下删除或者格式化的数据恢复流程和可行性。
|
8月前
|
缓存 Linux 测试技术
安装【银河麒麟V10】linux系统--并挂载镜像
安装【银河麒麟V10】linux系统--并挂载镜像
2445 0