Ceph Reef(18.2.X)集群的OSD管理基础及OSD节点扩缩容

简介: 这篇文章是关于Ceph Reef(18.2.X)集群的OSD管理基础及OSD节点扩缩容的详细教程,涵盖了OSD的基础操作、节点缩容的步骤和实战案例以及OSD节点扩容的基本流程和实战案例。

                                              作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。

一.ceph集群的OSD基础操作

1.查看osd的ID编号

[root@ceph141 ~]# ceph osd ls
0
1
2
3
4
5
[root@ceph141 ~]#

2.查看osd的详细信息

[root@ceph141 ~]# ceph osd dump
epoch 58
fsid c044ff3c-5f05-11ef-9d8b-51db832765d6
created 2024-08-20T15:06:28.128978+0000
modified 2024-08-20T22:48:38.568646+0000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 16
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client jewel
require_osd_release reef
stretch_mode_enabled false
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.00
max_osd 6
osd.0 up   in  weight 1 up_from 53 up_thru 24 down_at 52 last_clean_interval [8,48) [v2:10.0.0.141:6808/2198281705,v1:10.0.0.141:6809/2198281705] [v2:10.0.0.141:6810/2198281705,v1:10.0.0.141:6811/2198281705] exists,up aa4d4b47-b9f1-444b-bd36-3b622391ce71
osd.1 up   in  weight 1 up_from 53 up_thru 27 down_at 52 last_clean_interval [13,48) [v2:10.0.0.141:6800/335450708,v1:10.0.0.141:6801/335450708] [v2:10.0.0.141:6802/335450708,v1:10.0.0.141:6803/335450708] exists,up 212eab65-f6f2-41c7-9d58-2f75f86d84b2
osd.2 up   in  weight 1 up_from 54 up_thru 0 down_at 53 last_clean_interval [18,48) [v2:10.0.0.142:6800/163080901,v1:10.0.0.142:6801/163080901] [v2:10.0.0.142:6802/163080901,v1:10.0.0.142:6803/163080901] exists,up e0ffb1a9-ca9b-45a1-a95f-42aca94e8f47
osd.3 up   in  weight 1 up_from 52 up_thru 56 down_at 51 last_clean_interval [25,48) [v2:10.0.0.142:6808/2086272149,v1:10.0.0.142:6809/2086272149] [v2:10.0.0.142:6810/2086272149,v1:10.0.0.142:6811/2086272149] exists,up fda125ef-9776-47d9-baf4-9483966fe183
osd.4 up   in  weight 1 up_from 56 up_thru 0 down_at 55 last_clean_interval [34,48) [v2:10.0.0.143:6808/1331943799,v1:10.0.0.143:6809/1331943799] [v2:10.0.0.143:6810/1331943799,v1:10.0.0.143:6811/1331943799] exists,up a4f27770-20c9-4a75-b0c2-a212ddc7ab3f
osd.5 up   in  weight 1 up_from 56 up_thru 0 down_at 55 last_clean_interval [44,48) [v2:10.0.0.143:6800/3466236845,v1:10.0.0.143:6801/3466236845] [v2:10.0.0.143:6802/3466236845,v1:10.0.0.143:6803/3466236845] exists,up c6f8968f-c425-4539-ba9b-39ff08683170
blocklist 10.0.0.141:0/2920906744 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:6801/1616144223 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:6800/1616144223 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:0/3338469979 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:0/287245293 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:0/1238275928 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:0/4254913043 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:0/4240352034 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:6801/1457839497 expires 2024-08-21T15:10:24.386665+0000
blocklist 10.0.0.141:6800/75389737 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:0/1951266866 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:6800/1457839497 expires 2024-08-21T15:10:24.386665+0000
blocklist 10.0.0.141:0/3710270280 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:0/2072915682 expires 2024-08-21T15:10:24.386665+0000
blocklist 10.0.0.141:6801/75389737 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:0/1341187958 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:0/1879865485 expires 2024-08-21T15:10:24.386665+0000
blocklist 10.0.0.141:6800/2392661167 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:0/1999918034 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:6801/2392661167 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:0/4277851589 expires 2024-08-21T15:10:24.386665+0000
[root@ceph141 ~]#

3.查看osd的状态信息

[root@ceph141 ~]# ceph osd status
ID  HOST      USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  ceph141  27.8M   199G      0        0       0        0   exists,up  
 1  ceph141  27.2M   299G      0        0       0        0   exists,up  
 2  ceph142  27.2M   199G      0        0       0        0   exists,up  
 3  ceph142  27.8M   299G      0        0       0        0   exists,up  
 4  ceph143  27.2M   299G      0        0       0        0   exists,up  
 5  ceph143  27.8M   199G      0        0       0        0   exists,up  
[root@ceph141 ~]#

4.查看osd的统计信息

[root@ceph141 ~]# ceph osd stat
6 osds: 6 up (since 8m), 6 in (since 7h); epoch: e58
[root@ceph141 ~]#

5.查看osd在主机上的存储信息

[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]#

6.查看osd延迟的统计信息

[root@ceph141 ~]# ceph osd perf
osd  commit_latency(ms)  apply_latency(ms)
  5                   0                  0
  4                   0                  0
  3                   0                  0
  2                   0                  0
  1                   0                  0
  0                   0                  0
[root@ceph141 ~]#

7.查看各个osd使用率

[root@ceph141 ~]# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE  VAR   PGS  STATUS
 0    hdd  0.19530   1.00000  200 GiB   28 MiB  1.1 MiB   4 KiB   27 MiB  200 GiB  0.01  1.26    1      up
 1    hdd  0.29300   1.00000  300 GiB   27 MiB  572 KiB   4 KiB   27 MiB  300 GiB  0.01  0.82    0      up
 2    hdd  0.19530   1.00000  200 GiB   27 MiB  572 KiB   4 KiB   27 MiB  200 GiB  0.01  1.24    0      up
 3    hdd  0.29300   1.00000  300 GiB   28 MiB  1.1 MiB   4 KiB   27 MiB  300 GiB  0.01  0.84    1      up
 4    hdd  0.29300   1.00000  300 GiB   27 MiB  572 KiB   4 KiB   27 MiB  300 GiB  0.01  0.83    0      up
 5    hdd  0.19530   1.00000  200 GiB   28 MiB  1.1 MiB   4 KiB   27 MiB  200 GiB  0.01  1.26    1      up
                       TOTAL  1.5 TiB  165 MiB  5.0 MiB  26 KiB  160 MiB  1.5 TiB  0.01                   
MIN/MAX VAR: 0.82/1.26  STDDEV: 0.00
[root@ceph141 ~]#

8.集群暂停接收数据

[root@ceph141 ~]# ceph -s
...
  services:
    mon: 3 daemons, quorum ceph141,ceph143,ceph142 (age 14m)
    mgr: ceph141.gqogmi(active, since 14m), standbys: ceph142.tisapy
    osd: 6 osds: 6 up (since 14m), 6 in (since 7h)

...
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pause
pauserd,pausewr is set
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph -s
...
  services:
    mon: 3 daemons, quorum ceph141,ceph143,ceph142 (age 14m)
    mgr: ceph141.gqogmi(active, since 14m), standbys: ceph142.tisapy
    osd: 6 osds: 6 up (since 14m), 6 in (since 7h)
         flags pauserd,pausewr  # 注意观察,此处多了pause标签

...

[root@ceph141 ~]#

9.集群开始接收数据

[root@ceph141 ~]# ceph -s
...

  services:
    mon: 3 daemons, quorum ceph141,ceph143,ceph142 (age 16m)
    mgr: ceph141.gqogmi(active, since 16m), standbys: ceph142.tisapy
    osd: 6 osds: 6 up (since 16m), 6 in (since 7h)
         flags pauserd,pausewr

...

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd unpause
pauserd,pausewr is unset
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph -s
...

  services:
    mon: 3 daemons, quorum ceph141,ceph143,ceph142 (age 16m)
    mgr: ceph141.gqogmi(active, since 16m), standbys: ceph142.tisapy
    osd: 6 osds: 6 up (since 16m), 6 in (since 7h)

...

[root@ceph141 ~]#

10.OSD写入权重操作

    1.查看默认OSD操作权重值
[root@ceph141 ~]# ceph osd crush tree
ID  CLASS  WEIGHT   TYPE NAME       
-1         1.46489  root default    
-3         0.48830      host ceph141
 0    hdd  0.19530          osd.0   
 1    hdd  0.29300          osd.1   
-5         0.48830      host ceph142
 2    hdd  0.19530          osd.2   
 3    hdd  0.29300          osd.3   
-7         0.48830      host ceph143
 4    hdd  0.29300          osd.4   
 5    hdd  0.19530          osd.5   
[root@ceph141 ~]# 

    2.修改OSD数据操作权重值
[root@ceph141 ~]# ceph osd crush reweight osd.4 0  # 将一块此篇权重设置为0,表示不往该磁盘写入数据啦,一般是下线节点时会临时使用!
reweighted item id 4 name 'osd.4' to 0 in crush map
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd crush tree
ID  CLASS  WEIGHT   TYPE NAME       
-1         1.17189  root default    
-3         0.48830      host ceph141
 0    hdd  0.19530          osd.0   
 1    hdd  0.29300          osd.1   
-5         0.48830      host ceph142
 2    hdd  0.19530          osd.2   
 3    hdd  0.29300          osd.3   
-7         0.19530      host ceph143
 4    hdd        0          osd.4   
 5    hdd  0.19530          osd.5   
[root@ceph141 ~]# 

    3.测试完成后,可以将权重改回去哟~
[root@ceph141 ~]# ceph osd crush reweight osd.4 0.29300
reweighted item id 4 name 'osd.4' to 0.293 in crush map
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd crush tree
ID  CLASS  WEIGHT   TYPE NAME       
-1         1.46489  root default    
-3         0.48830      host ceph141
 0    hdd  0.19530          osd.0   
 1    hdd  0.29300          osd.1   
-5         0.48830      host ceph142
 2    hdd  0.19530          osd.2   
 3    hdd  0.29300          osd.3   
-7         0.48830      host ceph143
 4    hdd  0.29300          osd.4   
 5    hdd  0.19530          osd.5   
[root@ceph141 ~]#

11.OSD上下线

温馨提示:
    - 1.由于OSD有专门的管理服务器"ceph-osd"控制,一旦发现被下线,会尝试启动它。
    - 2.如果真的想要永久关闭,则需要关闭对应的ceph-osd进程即可,例如"ceph-osd@4";

    1.临时关闭osd会自动被拉起
[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd down 4 ; ceph osd tree
marked down osd.4. 
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 4    hdd  0.29300          osd.4       down   1.00000  1.00000  # 注意观察,此处osd已经关闭啦。
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd tree  
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 4    hdd  0.29300          osd.4         up   1.00000  1.00000  # 但不难发现,其会自动重启!
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]# 


    2.永久关闭
[root@ceph141 ~]# ceph orch daemon stop osd.3  # 直接停止osd.3的守护进程
Scheduled to stop osd.3 on host 'ceph143'
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch daemon stop osd.5  # 直接停止osd.5的守护进程
Scheduled to stop osd.5 on host 'ceph143'
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 3    hdd  0.29300          osd.3       down   1.00000  1.00000
 5    hdd  0.19530          osd.5       down   1.00000  1.00000
[root@ceph141 ~]#

12.驱逐OSD设备

[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd out 4  # 驱逐编号为4的OSD设备
marked out osd.4. 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 4    hdd  0.29300          osd.4         up         0  1.00000  # 本质上是对ceph集群数据操作的权重值REWEIGHT重新调整。
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]#

13.加入OSD设备

[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 4    hdd  0.29300          osd.4         up         0  1.00000
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd in 4  # 将编号为4的设备重新加入节点。
marked in osd.4. 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]#

二.ceph集群的OSD节点缩容

1.删除OSD设备基本流程

ceph-deploy删除OSD设备时建议遵循如下流程:
    - 1.到指定节点上,停止指定的OSD进程【选做】;
    - 2.清理OSD数据【选做】;
    - 3.从crush中移除OSD节点,该节点不作为数据的载体【必做】;
    - 4.驱逐被下线节点【必做】;
    - 5.下线OSD节点,即删除OSD节点主机【必做】;
    - 6.客户端解除ceph对磁盘的占用【必做】;

温馨提示:
    前2个步骤可以省略哟。

2.删除OSD实战案例

2.1 卸载集群前状态查看

[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
-7         0.48830      host ceph143                           
 3    hdd  0.29300          osd.3         up   1.00000  1.00000
 5    hdd  0.19530          osd.5         up   1.00000  1.00000
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph device ls
DEVICE                                 HOST:DEV     DAEMONS      WEAR  LIFE EXPECTANCY
ATA_VBOX_HARDDISK_VB44d8d962-22b2507e  ceph142:sdb  osd.2                             
ATA_VBOX_HARDDISK_VB586591eb-921dc802  ceph143:sdc  osd.5                             
ATA_VBOX_HARDDISK_VB7b0f012c-688b1185  ceph142:sdc  osd.4                             
ATA_VBOX_HARDDISK_VB7ddbae3f-13ea8edd  ceph143:sda  mon.ceph143                       
ATA_VBOX_HARDDISK_VB7f99f134-d6f80b2c  ceph141:sdb  osd.0                             
ATA_VBOX_HARDDISK_VB8587e457-f6eca36a  ceph141:sdc  osd.1                             
ATA_VBOX_HARDDISK_VBab58677d-fb9dc89f  ceph143:sdb  osd.3                             
ATA_VBOX_HARDDISK_VBbcff97b3-3bc2fb47  ceph141:sda  mon.ceph141                       
ATA_VBOX_HARDDISK_VBe309cee1-15dd71d4  ceph142:sda  mon.ceph142                       
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph -s
  cluster:
    id:     c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4
    health: HEALTH_WARN
            clock skew detected on mon.ceph142

  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 2m)
    mgr: ceph141.fuztcs(active, since 111s), standbys: ceph142.vdsfzv
    osd: 6 osds: 6 up (since 115s), 6 in (since 2h)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   565 MiB used, 1.5 TiB / 1.5 TiB avail
    pgs:     1 active+clean

[root@ceph141 ~]#

2.2 停止需要下线节点的所有osd守护进程

[root@ceph141 ~]# ceph orch daemon stop osd.3
Scheduled to stop osd.3 on host 'ceph143'
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch daemon stop osd.5
Scheduled to stop osd.5 on host 'ceph143'
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
-7         0.48830      host ceph143  # 停止后,不难发现设备已经处于down的状态啦~                           
 3    hdd  0.29300          osd.3       down   1.00000  1.00000
 5    hdd  0.19530          osd.5       down   1.00000  1.00000
[root@ceph141 ~]#

2.3 清理OSD数据和配置

[root@ceph141 ~]# ceph osd purge 3 --force
purged osd.3
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd purge 5 --force
purged osd.5
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         0.97659  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
-7               0      host ceph143                           
[root@ceph141 ~]#

2.4 移除所有OSD后,从CRUSH map中删除主机

[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         0.97659  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
-7               0      host ceph143                           
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd crush rm ceph143
removed item id -7 name 'ceph143' from crush map
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         0.97659  root default                               
-3         0.48830      host ceph141                           
 0    hdd  0.19530          osd.0         up   1.00000  1.00000
 1    hdd  0.29300          osd.1         up   1.00000  1.00000
-5         0.48830      host ceph142                           
 2    hdd  0.19530          osd.2         up   1.00000  1.00000
 4    hdd  0.29300          osd.4         up   1.00000  1.00000
[root@ceph141 ~]#

2.5 自动驱逐被下线节点的服务组件

[root@ceph141 ~]# ceph orch host ls
HOST     ADDR        LABELS  STATUS  
ceph141  10.0.0.141  _admin          
ceph142  10.0.0.142                  
ceph143  10.0.0.143                  
3 hosts in cluster
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch host drain ceph143
Scheduled to remove the following daemons from host 'ceph143'
type                 id             
-------------------- ---------------
node-exporter        ceph143        
ceph-exporter        ceph143        
osd                  5              
osd                  3              
mon                  ceph143        
crash                ceph143        
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch host drain ceph143  # 发现2个OSD是无法自动驱逐的
Scheduled to remove the following daemons from host 'ceph143'
type                 id             
-------------------- ---------------
osd                  5              
osd                  3              
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch host ls
HOST     ADDR        LABELS                         STATUS  
ceph141  10.0.0.141  _admin                                 
ceph142  10.0.0.142                                         
ceph143  10.0.0.143  _no_schedule,_no_conf_keyring          
3 hosts in cluster
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph -s
  cluster:
    id:     c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4
    health: HEALTH_OK

  services:
    mon: 2 daemons, quorum ceph141,ceph142 (age 39s)
    mgr: ceph141.fuztcs(active, since 13m), standbys: ceph142.vdsfzv
    osd: 4 osds: 4 up (since 9m), 4 in (since 2h); 1 remapped pgs

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   111 MiB used, 1000 GiB / 1000 GiB avail
    pgs:     2/6 objects misplaced (33.333%)
             1 active+clean+remapped

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph -s
  cluster:
    id:     c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4
    health: HEALTH_OK

  services:
    mon: 2 daemons, quorum ceph141,ceph142 (age 39s)
    mgr: ceph141.fuztcs(active, since 13m), standbys: ceph142.vdsfzv
    osd: 4 osds: 4 up (since 9m), 4 in (since 2h); 1 remapped pgs

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   111 MiB used, 1000 GiB / 1000 GiB avail
    pgs:     2/6 objects misplaced (33.333%)
             1 active+clean+remapped

[root@ceph141 ~]#

2.6 下线节点

    1.手动删除停止的OSD组件
[root@ceph141 ~]# ceph orch host drain ceph143
Scheduled to remove the following daemons from host 'ceph143'
type                 id             
-------------------- ---------------
osd                  5              
osd                  3              
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch daemon rm osd.3 --force
Removed osd.3 from host 'ceph143'
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch daemon rm osd.5 --force
Removed osd.5 from host 'ceph143'
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch host drain ceph143
Scheduled to remove the following daemons from host 'ceph143'
type                 id             
-------------------- ---------------
[root@ceph141 ~]# 
[root@ceph141 ~]# 

    2.删除节点
[root@ceph141 ~]# ceph orch host ls
HOST     ADDR        LABELS                         STATUS  
ceph141  10.0.0.141  _admin                                 
ceph142  10.0.0.142                                         
ceph143  10.0.0.143  _no_schedule,_no_conf_keyring          
3 hosts in cluster
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch host rm ceph143
Removed  host 'ceph143'
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph orch host ls
HOST     ADDR        LABELS  STATUS  
ceph141  10.0.0.141  _admin          
ceph142  10.0.0.142                  
2 hosts in cluster
[root@ceph141 ~]# 


    3.再次查看设备信息,很明显,没有ceph143节点啦~
[root@ceph141 ~]# ceph device ls
DEVICE                                 HOST:DEV     DAEMONS      WEAR  LIFE EXPECTANCY
ATA_VBOX_HARDDISK_VB44d8d962-22b2507e  ceph142:sdb  osd.2                             
ATA_VBOX_HARDDISK_VB7b0f012c-688b1185  ceph142:sdc  osd.4                             
ATA_VBOX_HARDDISK_VB7f99f134-d6f80b2c  ceph141:sdb  osd.0                             
ATA_VBOX_HARDDISK_VB8587e457-f6eca36a  ceph141:sdc  osd.1                             
ATA_VBOX_HARDDISK_VBbcff97b3-3bc2fb47  ceph141:sda  mon.ceph141                       
ATA_VBOX_HARDDISK_VBe309cee1-15dd71d4  ceph142:sda  mon.ceph142                       
[root@ceph141 ~]#

2.7 客户端解除ceph对磁盘的占用

    1.查看客户端的设备信息
[root@ceph143 ~]# lsblk 
NAME                                                                        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
...
sdb                                                                           8:16   0  300G  0 disk 
└─ceph--f10a734e--4198--4020--ba11--63c4dcb9e62d-osd--block--0065bf70--6947--4b17--86ed--c1d902120512
                                                                            253:0    0  300G  0 lvm  
sdc                                                                           8:32   0  200G  0 disk 
└─ceph--73509ee8--226b--4cb7--b35c--40b163d6aff6-osd--block--60ac4910--12bc--4f21--9d89--2cfd48ed0cb4
                                                                            253:1    0  200G  0 lvm  
...
[root@ceph143 ~]# 

    2.查看本地的OSD编号和对应的磁盘设备对应关系
[root@ceph143 ~]# cat /var/lib/ceph/c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4/osd.3/fsid 
0065bf70-6947-4b17-86ed-c1d902120512
[root@ceph143 ~]# 
[root@ceph143 ~]# cat /var/lib/ceph/c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4/osd.5/fsid 
60ac4910-12bc-4f21-9d89-2cfd48ed0cb4
[root@ceph143 ~]# 

    3.查看ceph占用磁盘的信息编号
[root@ceph143 ~]# dmsetup status
ceph--73509ee8--226b--4cb7--b35c--40b163d6aff6-osd--block--60ac4910--12bc--4f21--9d89--2cfd48ed0cb4: 0 419422208 linear 
ceph--f10a734e--4198--4020--ba11--63c4dcb9e62d-osd--block--0065bf70--6947--4b17--86ed--c1d902120512: 0 629137408 linear 
ubuntu--vg-ubuntu--lv: 0 50323456 linear 
[root@ceph143 ~]# 

    4.客户端解除ceph对磁盘的占用
[root@ceph143 ~]# dmsetup remove ceph--73509ee8--226b--4cb7--b35c--40b163d6aff6-osd--block--60ac4910--12bc--4f21--9d89--2cfd48ed0cb4
[root@ceph143 ~]# 
[root@ceph143 ~]# dmsetup remove ceph--f10a734e--4198--4020--ba11--63c4dcb9e62d-osd--block--0065bf70--6947--4b17--86ed--c1d902120512
[root@ceph143 ~]# 

    5.再次查看本地磁盘设备,观察是否解除占用
[root@ceph143 ~]# lsblk 
NAME                      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
...
sdb                         8:16   0  300G  0 disk 
sdc                         8:32   0  200G  0 disk 
...  
[root@ceph143 ~]#

2.8 OSD节点下线流程推敲

通过上面的实验,你是否发现下线一个节点很简单,没错还可以更简单,也就是2-3步骤是可以省略的哟。

建议下线节点前拍个快照,如果没有做快照,就看我课堂演示吧。

推荐阅读:
    https://docs.redhat.com/zh_hans/documentation/red_hat_ceph_storage/4/html/operations_guide/removing-a-ceph-osd-node_ops
    https://docs.redhat.com/zh_hans/documentation/red_hat_ceph_storage/4/html-single/operations_guide/index#replacing-a-bluestore-database-disk-using-the-command-line-interface_ops

三.ceph集群的OSD节点扩容

1.添加OSD设备的基本流程

添加OSD设备的流程如下:
    - 1.确定OSD节点的设备是否被占用;
    - 2.擦出或者格式化OSD的数据【可选】;
    - 3.添加OSD到集群;

温馨提示:
    当OSD加入到集群的时候,它会自动为OSD所在的主机创建一个专属的fsid编号:  "/var/lib/ceph/osd/${CEPH-CLUSTER-ID}/${osd-ID}/fsid"

2.扩容实战案例

略,参考之前的笔记即可。

推荐阅读:
    https://www.cnblogs.com/yinzhengjie/p/18370686#5ceph集群添加或移除主机
目录
相关文章
|
2月前
|
存储
Ceph Reef(18.2.X)的CephFS高可用集群实战案例
这篇文章是关于Ceph Reef(18.2.X)版本中CephFS高可用集群的实战案例,涵盖了CephFS的基础知识、一主一从架构的搭建、多主一从架构的配置、客户端挂载方式以及fuse方式访问CephFS的详细步骤和配置。
69 3
Ceph Reef(18.2.X)的CephFS高可用集群实战案例
|
2月前
|
块存储
ceph集群的OSD设备扩缩容实战指南
这篇文章详细介绍了Ceph集群中OSD设备的扩容和缩容过程,包括如何添加新的OSD设备、如何准备和部署,以及如何安全地移除OSD设备并从Crushmap中清除相关配置。
151 4
|
2月前
|
存储 关系型数据库 块存储
Ceph Reef(18.2.X)集群的状态管理实战
这篇文章是关于Ceph Reef(18.2.X)集群的状态管理实战,包括如何检查集群状态、OSD状态、MON监视器映射、PG和OSD存储对应关系,以及如何通过套接字管理集群和修改集群配置的详细指南。
59 4
|
2月前
|
块存储
ceph-deploy部署ceph分部署集群
这篇博客详细介绍了如何使用ceph-deploy工具部署Ceph集群,包括环境准备、配置hosts、免密登录、时间同步、添加块设备、部署mon、mgr组件以及初始化OSD节点的步骤,并提供了在部署过程中可能遇到的问题和解决方案。
92 4
|
2月前
|
Shell 容器
Ceph Reef(18.2.X)访问ceph集群的方式及管理员节点配置案例
这篇文章是关于Ceph Reef(18.2.X)版本中访问ceph集群的方式和管理员节点配置的案例,介绍了使用cephadm shell的不同方式访问集群和如何配置管理节点以方便集群管理。
71 5
|
2月前
|
Prometheus 监控 Cloud Native
Ceph Reef(18.2.X)的内置Prometheus监控集群
这篇文章是关于Ceph Reef(18.2.X)版本中内置Prometheus监控集群的使用方法,包括如何查看集群架构、访问Prometheus、Grafana、Node-Exporter和Alertmanager的Web界面,以及推荐阅读的自实现Prometheus监控资源链接。
61 2
|
域名解析 存储 块存储
ceph集群的搭建
ceph集群的搭建
359 1
|
块存储
ceph集群的搭建(下)
ceph集群的搭建
157 0
|
存储 关系型数据库 网络安全
手动部署ceph octopus集群
手动部署ceph octopus集群
手动部署ceph octopus集群
|
存储 安全 关系型数据库
如何从 Ceph (Luminous) 集群中安全移除 OSD
OSD.png 工作中需要从 Ceph 的集群中移除一台存储服务器,挪作他用。Ceph 存储空间即使在移除该存储服务器后依旧够用,所以操作是可行的,但集群已经运行了很长时间,每个服务器上都存储了很多数据,在数据无损的情况下移除,看起来也不简单。
1720 0