在执行了 ceph 扩容之前, 发现长时间内都具有下面的状态存在
参考下面信息
# ceph -s
cluster dc4f91c1-8792-4948-b68f-2fcea75f53b9
health HEALTH_WARN 13 pgs backfill_toofull; 1 pgs degraded; 1 pgs stuck degraded; 13 pgs stuck unclean; 9 requests are blocked > 32 sec; recovery 190/54152986 objects degraded (0.000%); 47030/54152986 objects misplaced (0.087%); 2 near full osd(s); clock skew detected on mon.hh-yun-ceph-cinder025-128075
monmap e3: 5 mons at {hh-yun-ceph-cinder015-128055=240.30.128.55:6789/0,hh-yun-ceph-cinder017-128057=240.30.128.57:6789/0,hh-yun-ceph-cinder024-128074=240.30.128.74:6789/0,hh-yun-ceph-cinder025-128075=240.30.128.75:6789/0,hh-yun-ceph-cinder026-128076=240.30.128.76:6789/0}, election epoch 168, quorum 0,1,2,3,4 hh-yun-ceph-cinder015-128055,hh-yun-ceph-cinder017-128057,hh-yun-ceph-cinder024-128074,hh-yun-ceph-cinder025-128075,hh-yun-ceph-cinder026-128076
osdmap e23216: 100 osds: 100 up, 100 in
pgmap v11159189: 20544 pgs, 2 pools, 70024 GB data, 17620 kobjects
205 TB used, 158 TB / 363 TB avail
190/54152986 objects degraded (0.000%); 47030/54152986 objects misplaced (0.087%)
20527 active+clean
1 active+degraded+remapped+backfill_toofull
4 active+clean+scrubbing+deep
12 active+remapped+backfill_toofull
client io 7609 kB/s rd, 46866 kB/s wr, 1909 op/s
参考重点:
12 active+remapped+backfill_toofull
1 active+degraded+remapped+backfill_toofull
获得 pg 详细信息
# ceph pg dump 2> /dev/null | grep -E 'pg_stat|toofull' | awk '{printf "%-8s %-15s %-15s %-15s %-55s\n", $1, $7, $15, $17, $10}'
pg_stat bytes up_primary acting_primary state
1.19ae 4427174912 [50,24,31] [21,33,69] active+remapped+backfill_toofull
1.f51 2313255936 [51,8,24] [8,31,58] active+remapped+backfill_toofull
1.86f 2199311872 [57,24,18] [57,22,65] active+degraded+remapped+backfill_toofull
1.531 2257795584 [12,59,24] [12,59,31] active+remapped+backfill_toofull
1.186 2359985152 [51,8,24] [2,27,57] active+remapped+backfill_toofull
1.4f35 2429229056 [52,24,38] [12,26,57] active+remapped+backfill_toofull
1.44cb 2247723008 [51,24,18] [15,26,60] active+remapped+backfill_toofull
1.405e 2286564864 [50,24,14] [16,27,40] active+remapped+backfill_toofull
1.3bc2 4308700672 [55,12,24] [55,14,40] active+remapped+backfill_toofull
1.3b35 4711967232 [43,52,24] [43,19,26] active+remapped+backfill_toofull
1.3845 4573419008 [12,59,24] [12,29,43] active+remapped+backfill_toofull
1.35f3 4424525312 [45,58,24] [45,23,59] active+remapped+backfill_toofull
1.291f 4661793280 [14,50,24] [14,21,48] active+remapped+backfill_toofull
参考上面资料
1. 当前有约莫 12 个 PG, PG 容量约为 2GB ~ 4.5GB 空间,
2. 参考 up_primary, 每个 pg 都需要占用 osd.24 作为数据存储空间
参考当前 OSD 容量
total used free ptc target
3.7T 3.2T 539G 86% /var/lib/ceph/osd/ceph-24
当前所有故障 PG 需要执行磁盘迁移
磁盘迁移时候, 都需要把数据存放至 OSD.24 中
约莫需要存 40GB 数据到上述 OSD 存储中
当前 OSD.24 磁盘容量空间为 86%
由于 osd near full 设定为 .85
因此, ceph 集群会认为该 osd 空间容量不足, 导致长时间都属于该状态中