ceph 故障分析(backfill_toofull)

简介: 在执行了 ceph 扩容之前, 发现长时间内都具有下面的状态存在 参考下面信息# ceph -s cluster dc4f91c1-8792-4948-b68f-2fcea75f53b9 health HEALTH_WARN 13 pgs backfill_toofull; 1 pgs degraded; 1 pgs stuck degraded

在执行了 ceph 扩容之前, 发现长时间内都具有下面的状态存在
参考下面信息

# ceph -s
    cluster dc4f91c1-8792-4948-b68f-2fcea75f53b9
     health HEALTH_WARN 13 pgs backfill_toofull; 1 pgs degraded; 1 pgs stuck degraded; 13 pgs stuck unclean; 9 requests are blocked > 32 sec; recovery 190/54152986 objects degraded (0.000%); 47030/54152986 objects misplaced (0.087%); 2 near full osd(s); clock skew detected on mon.hh-yun-ceph-cinder025-128075
     monmap e3: 5 mons at {hh-yun-ceph-cinder015-128055=240.30.128.55:6789/0,hh-yun-ceph-cinder017-128057=240.30.128.57:6789/0,hh-yun-ceph-cinder024-128074=240.30.128.74:6789/0,hh-yun-ceph-cinder025-128075=240.30.128.75:6789/0,hh-yun-ceph-cinder026-128076=240.30.128.76:6789/0}, election epoch 168, quorum 0,1,2,3,4 hh-yun-ceph-cinder015-128055,hh-yun-ceph-cinder017-128057,hh-yun-ceph-cinder024-128074,hh-yun-ceph-cinder025-128075,hh-yun-ceph-cinder026-128076
     osdmap e23216: 100 osds: 100 up, 100 in
      pgmap v11159189: 20544 pgs, 2 pools, 70024 GB data, 17620 kobjects
            205 TB used, 158 TB / 363 TB avail
            190/54152986 objects degraded (0.000%); 47030/54152986 objects misplaced (0.087%)
               20527 active+clean
                   1 active+degraded+remapped+backfill_toofull
                   4 active+clean+scrubbing+deep
                  12 active+remapped+backfill_toofull
  client io 7609 kB/s rd, 46866 kB/s wr, 1909 op/s

参考重点:

12 active+remapped+backfill_toofull
1 active+degraded+remapped+backfill_toofull

获得 pg 详细信息

# ceph pg dump 2> /dev/null | grep -E 'pg_stat|toofull' | awk '{printf "%-8s %-15s %-15s %-15s %-55s\n", $1, $7, $15, $17, $10}'
pg_stat  bytes           up_primary      acting_primary  state
1.19ae   4427174912      [50,24,31]      [21,33,69]      active+remapped+backfill_toofull
1.f51    2313255936      [51,8,24]       [8,31,58]       active+remapped+backfill_toofull
1.86f    2199311872      [57,24,18]      [57,22,65]      active+degraded+remapped+backfill_toofull
1.531    2257795584      [12,59,24]      [12,59,31]      active+remapped+backfill_toofull
1.186    2359985152      [51,8,24]       [2,27,57]       active+remapped+backfill_toofull
1.4f35   2429229056      [52,24,38]      [12,26,57]      active+remapped+backfill_toofull
1.44cb   2247723008      [51,24,18]      [15,26,60]      active+remapped+backfill_toofull
1.405e   2286564864      [50,24,14]      [16,27,40]      active+remapped+backfill_toofull
1.3bc2   4308700672      [55,12,24]      [55,14,40]      active+remapped+backfill_toofull
1.3b35   4711967232      [43,52,24]      [43,19,26]      active+remapped+backfill_toofull
1.3845   4573419008      [12,59,24]      [12,29,43]      active+remapped+backfill_toofull
1.35f3   4424525312      [45,58,24]      [45,23,59]      active+remapped+backfill_toofull
1.291f   4661793280      [14,50,24]      [14,21,48]      active+remapped+backfill_toofull

参考上面资料

1. 当前有约莫 12 个 PG,  PG 容量约为 2GB ~ 4.5GB 空间, 
2. 参考 up_primary, 每个 pg 都需要占用 osd.24 作为数据存储空间

参考当前 OSD 容量

total  used  free  ptc   target
3.7T   3.2T  539G  86%   /var/lib/ceph/osd/ceph-24 

当前所有故障 PG 需要执行磁盘迁移
磁盘迁移时候, 都需要把数据存放至 OSD.24 中
约莫需要存 40GB 数据到上述 OSD 存储中
当前 OSD.24 磁盘容量空间为 86%
由于 osd near full 设定为 .85
因此, ceph 集群会认为该 osd 空间容量不足, 导致长时间都属于该状态中

目录
相关文章
|
存储 Kubernetes API
k8s 动态存储管理案例(GlusterFS)
k8s GlusterFS动态存储管理案例
551 0
|
块存储
【Openstack】排错:Cinder创建云硬盘状态错误解决
Cinder创建云硬盘状态错误,配置服务器时钟同步
3662 0
【Openstack】排错:Cinder创建云硬盘状态错误解决
|
运维 开发工具 虚拟化
【运维排错】ESXI虚拟化主机时启动报错 Error 33 (Inconsistent data)
2202年了,大家应该还在用VMware的vSphere套件吧?不会大家的虚拟化方案都不用VMware了吧?不会吧不会吧W
5234 0
【运维排错】ESXI虚拟化主机时启动报错 Error 33 (Inconsistent data)
|
存储 运维 Linux
RH236GlusterFS排错
RH236GlusterFS排错
164 0
RH236GlusterFS排错
|
监控
glusterfs 监控
这里以监控gv_KVM这个卷为例 1.启动 Profiling [root@192_168_174_68 ~]# gluster gluster> volume profile gv_KVM start Starting volume profile on gv_KVM has been successful 2.
1623 0
|
存储 网络协议 索引
GlusterFS数据存储脑裂修复方案
本文档介绍了glusterfs中可用于监视复制卷状态的`heal info`命令以及解决脑裂的方法
1318 0
|
Prometheus 监控 Cloud Native
使用 Promwwwzs12558comI3578II9877etheus 监控 Ceph
本文是在 Ubuntu 16.04 最新版基础上安装 Prometheus 监控系统,Ceph 版本为 Luminous 12.2.8。 1. 安装 Prometheus 直接使用 apt 安装的 Prometheus 版本较低,很多新的配置选项都已不再支持,建议使用 Prometheus 的安装包,接下来看看安装包部署的步骤。
2818 0
|
Shell 块存储 监控
Ceph Jewel 手动升级Luminous
# ceph mgr module enable dashboard # ceph mgr module ls { "enabled_modules": [ "balancer", "dashboard", "restful", .
2545 0
|
存储 网络安全 Perl