阿里云 系统磁盘总读BPS 突然增长很高,导致网站502 Bad Gateway
我也遇到了这个情况,处理的过程这里分享一下 1. 根据时间轴检查系统日志
cat /var/log/messages
发现如下异常日志
Feb 20 22:41:12 ecs-for-tesla-001 systemd[1]: Starting dnf makecache...
Feb 20 22:41:13 ecs-for-tesla-001 dnf[4229]: AnolisOS-8 - AppStream 140 kB/s | 4.3 kB 00:00
Feb 20 22:41:13 ecs-for-tesla-001 dnf[4229]: AnolisOS-8 - BaseOS 350 kB/s | 4.3 kB 00:00
Feb 20 22:41:13 ecs-for-tesla-001 dnf[4229]: AnolisOS-8 - Extras 106 kB/s | 3.8 kB 00:00
Feb 20 22:41:13 ecs-for-tesla-001 dnf[4229]: AnolisOS-8 - PowerTools 113 kB/s | 4.2 kB 00:00
Feb 20 22:41:14 ecs-for-tesla-001 dnf[4229]: Docker CE Stable - x86_64 23 kB/s | 3.5 kB 00:00
Feb 20 22:41:14 ecs-for-tesla-001 dnf[4229]: Extra Packages for Enterprise Linux 8 - x86_64 378 kB/s | 4.7 kB 00:00
Feb 20 22:41:23 ecs-for-tesla-001 kernel: containerd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-999
Feb 20 22:43:34 ecs-for-tesla-001 kernel: CPU: 0 PID: 1761 Comm: containerd Tainted: G OE --------- - - 4.18.0-372.16.1.an8_6.x86_64 #1
Feb 20 22:46:49 ecs-for-tesla-001 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
Feb 20 22:49:48 ecs-for-tesla-001 kernel: Call Trace:
Feb 20 22:51:47 ecs-for-tesla-001 kernel: dump_stack+0x41/0x60
Feb 20 22:58:09 ecs-for-tesla-001 kernel: dump_header+0x4a/0x1db
Feb 20 23:05:53 ecs-for-tesla-001 kernel: oom_kill_process.cold.32+0xb/0x10
Feb 20 23:12:21 ecs-for-tesla-001 kernel: out_of_memory+0x1bd/0x4e0
Feb 20 23:17:59 ecs-for-tesla-001 kernel: __alloc_pages_slowpath+0xbdc/0xcc0
Feb 20 23:23:14 ecs-for-tesla-001 kernel: ? __switch_to_asm+0x35/0x70
Feb 20 23:29:47 ecs-for-tesla-001 kernel: __alloc_pages_nodemask+0x2db/0x310
Feb 20 23:37:38 ecs-for-tesla-001 kernel: pagecache_get_page+0xca/0x310
Feb 20 23:42:58 ecs-for-tesla-001 kernel: Linux version 4.18.0-372.16.1.an8_6.x86_64 (mockbuild@anolis-build-01.openanolis.cn) (gcc version 8.5.0 20210514 (Anolis 8.5.0-10.0.1) (GCC)) #1 SMP Thu Jul 14 10:28:59 CST 2022
猜测是dnf后台更新缓存导致磁盘IO高导致,并且看到这里应该更新导致内存不足,系统还kill掉了我的docker容器
解决方案,卸载dnf或者关闭make-cache的动作
systemctl stop dnf-makecache.timer
systemctl disable dnf-makecache.timer
赞30
踩0