开发者社区> blackpiglet> 正文

如何从 Ceph (Luminous) 集群中安全移除 OSD

简介: OSD.png 工作中需要从 Ceph 的集群中移除一台存储服务器,挪作他用。Ceph 存储空间即使在移除该存储服务器后依旧够用,所以操作是可行的,但集群已经运行了很长时间,每个服务器上都存储了很多数据,在数据无损的情况下移除,看起来也不简单。
+关注继续查看
img_5fcdc033c658a988ecb31b62cf9dd241.png
OSD.png

工作中需要从 Ceph 的集群中移除一台存储服务器,挪作他用。Ceph 存储空间即使在移除该存储服务器后依旧够用,所以操作是可行的,但集群已经运行了很长时间,每个服务器上都存储了很多数据,在数据无损的情况下移除,看起来也不简单。

1. OSD 布局

先来看看 OSD 的布局

$ ceph osd tree
ID CLASS WEIGHT    TYPE NAME      STATUS REWEIGHT PRI-AFF 
-1       265.25757 root default                           
-5       132.62878     host osd7                          
24   hdd   5.52620         osd.24     up  1.00000 1.00000 
25   hdd   5.52620         osd.25     up  1.00000 1.00000 
26   hdd   5.52620         osd.26     up  1.00000 1.00000 
27   hdd   5.52620         osd.27     up  1.00000 1.00000 
28   hdd   5.52620         osd.28     up  1.00000 1.00000 
29   hdd   5.52620         osd.29     up  1.00000 1.00000 
30   hdd   5.52620         osd.30     up  1.00000 1.00000 
31   hdd   5.52620         osd.31     up  1.00000 1.00000 
32   hdd   5.52620         osd.32     up  1.00000 1.00000 
33   hdd   5.52620         osd.33     up  1.00000 1.00000 
34   hdd   5.52620         osd.34     up  1.00000 1.00000 
35   hdd   5.52620         osd.35     up  1.00000 1.00000 
36   hdd   5.52620         osd.36     up  1.00000 1.00000 
37   hdd   5.52620         osd.37     up  1.00000 1.00000 
38   hdd   5.52620         osd.38     up  1.00000 1.00000 
39   hdd   5.52620         osd.39     up  1.00000 1.00000 
40   hdd   5.52620         osd.40     up  1.00000 1.00000 
41   hdd   5.52620         osd.41     up  1.00000 1.00000 
42   hdd   5.52620         osd.42     up  1.00000 1.00000 
43   hdd   5.52620         osd.43     up  1.00000 1.00000 
44   hdd   5.52620         osd.44     up  1.00000 1.00000 
45   hdd   5.52620         osd.45     up  1.00000 1.00000 
46   hdd   5.52620         osd.46     up  1.00000 1.00000 
47   hdd   5.52620         osd.47     up  1.00000 1.00000 
-3       132.62878     host osd8                          
 0   hdd   5.52620         osd.0      up  1.00000 1.00000 
 1   hdd   5.52620         osd.1      up  1.00000 1.00000 
 2   hdd   5.52620         osd.2      up  1.00000 1.00000 
 3   hdd   5.52620         osd.3      up  1.00000 1.00000 
 4   hdd   5.52620         osd.4      up  1.00000 1.00000 
 5   hdd   5.52620         osd.5      up  1.00000 1.00000 
 6   hdd   5.52620         osd.6      up  1.00000 1.00000 
 7   hdd   5.52620         osd.7      up  1.00000 1.00000 
 8   hdd   5.52620         osd.8      up  1.00000 1.00000 
 9   hdd   5.52620         osd.9      up  1.00000 1.00000 
10   hdd   5.52620         osd.10     up  1.00000 1.00000 
11   hdd   5.52620         osd.11     up  1.00000 1.00000 
12   hdd   5.52620         osd.12     up  1.00000 1.00000 
13   hdd   5.52620         osd.13     up  1.00000 1.00000 
14   hdd   5.52620         osd.14     up  1.00000 1.00000 
15   hdd   5.52620         osd.15     up  1.00000 1.00000 
16   hdd   5.52620         osd.16     up  1.00000 1.00000 
17   hdd   5.52620         osd.17     up  1.00000 1.00000 
18   hdd   5.52620         osd.18     up  1.00000 1.00000 
19   hdd   5.52620         osd.19     up  1.00000 1.00000 
20   hdd   5.52620         osd.20     up  1.00000 1.00000 
21   hdd   5.52620         osd.21     up  1.00000 1.00000 
22   hdd   5.52620         osd.22     up  1.00000 1.00000 
23   hdd   5.52620         osd.23     up  1.00000 1.00000 

一共两台服务器,48 个 OSD。需要把 osd8 移除,那么就需要把上面的所有的 24 个 OSD 全部删除。

2. 单个 OSD 进程删除流程

以移除 osd.0 为例看一下移除 OSD 的流程:

2.1 将状态设置成 out

首先要现将 OSD 状态设置成 out。

$ ceph osd out 0
marked out osd.0. 

这个阶段 ceph 会自动将处于 out 状态 OSD 中的数据迁移到其他状态正常的 OSD 上,所以在执行完成后,需要使用 ceph -w 查看数据迁移流程。等到不再有输出后,数据迁移完毕。

$ ceph -w  
  cluster:
    id:     063ed8d6-fc89-4fcb-8811-ff23915983e7
    health: HEALTH_ERR
            12408/606262 objects misplaced (2.047%)
            6 scrub errors
            Reduced data availability: 2 pgs peering
            Possible data damage: 5 pgs inconsistent
            application not enabled on 7 pool(s)
 
  services:
    mon: 3 daemons, quorum dell1,dell2,dell3
    mgr: dell1(active)
    mds: cephfs-1/1/1 up  {0=dell1=up:active}, 2 up:standby
    osd: 48 osds: 48 up, 47 in; 44 remapped pgs
    rgw: 3 daemons active
 
  data:
    pools:   22 pools, 1816 pgs
    objects: 296k objects, 963 GB
    usage:   5222 GB used, 254 TB / 259 TB avail
    pgs:     0.220% pgs not active
             12408/606262 objects misplaced (2.047%)
             1763 active+clean
             29   active+remapped+backfill_wait
             14   active+remapped+backfilling
             5    active+clean+inconsistent
             3    peering
             1    active+recovery_wait
             1    activating+remapped
 
  io:
    client:   59450 kB/s rd, 4419 MB/s wr, 1095 op/s rd, 2848 op/s wr
    recovery: 253 MB/s, 210 keys/s, 123 objects/s
 

2018-07-05 14:21:07.867104 mon.dell1 [WRN] Health check failed: Degraded data redundancy: 7/605732 objects degraded (0.001%), 1 pg degraded (PG_DEGRADED)
2018-07-05 14:21:12.252395 mon.dell1 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 7/605732 objects degraded (0.001%), 1 pg degraded)
2018-07-05 14:21:13.510741 mon.dell1 [WRN] Health check update: 12269/606262 objects misplaced (2.024%) (OBJECT_MISPLACED)
2018-07-05 14:21:13.510797 mon.dell1 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 2 pgs peering)
2018-07-05 14:21:19.488864 mon.dell1 [WRN] Health check update: 11553/606262 objects misplaced (1.906%) (OBJECT_MISPLACED)
2018-07-05 14:21:25.502619 mon.dell1 [WRN] Health check update: 10504/606262 objects misplaced (1.733%) (OBJECT_MISPLACED)
2018-07-05 14:21:31.745600 mon.dell1 [WRN] Health check update: 10091/606262 objects misplaced (1.664%) (OBJECT_MISPLACED)
2018-07-05 14:21:36.779666 mon.dell1 [WRN] Health check update: 9309/606262 objects misplaced (1.535%) (OBJECT_MISPLACED)
2018-07-05 14:21:41.779947 mon.dell1 [WRN] Health check update: 8580/606262 objects misplaced (1.415%) (OBJECT_MISPLACED)
2018-07-05 14:21:46.816584 mon.dell1 [WRN] Health check update: 8215/606262 objects misplaced (1.355%) (OBJECT_MISPLACED)
2018-07-05 14:21:51.817014 mon.dell1 [WRN] Health check update: 7331/606262 objects misplaced (1.209%) (OBJECT_MISPLACED)
2018-07-05 14:21:56.817406 mon.dell1 [WRN] Health check update: 6929/606262 objects misplaced (1.143%) (OBJECT_MISPLACED)
2018-07-05 14:22:01.817820 mon.dell1 [WRN] Health check update: 6426/606262 objects misplaced (1.060%) (OBJECT_MISPLACED)
2018-07-05 14:22:06.818188 mon.dell1 [WRN] Health check update: 5787/606262 objects misplaced (0.955%) (OBJECT_MISPLACED)
2018-07-05 14:22:11.818606 mon.dell1 [WRN] Health check update: 5429/606262 objects misplaced (0.895%) (OBJECT_MISPLACED)
2018-07-05 14:22:16.818981 mon.dell1 [WRN] Health check update: 5165/606262 objects misplaced (0.852%) (OBJECT_MISPLACED)
2018-07-05 14:22:20.303513 osd.35 [ERR] 13.2ad missing primary copy of 13:b56abc11:::d9593962-fa39-406f-bc35-7e4fcac1be9f.44307.2__shadow_121_1530008810116503747%2fplatform-cms.rar.2~vf7AEOlGNYM1ggI6IhV-iu22oDDXcvS.5_1:head, will try copies on 0
2018-07-05 14:22:21.819353 mon.dell1 [WRN] Health check update: 4866/606262 objects misplaced (0.803%) (OBJECT_MISPLACED)
2018-07-05 14:22:26.819657 mon.dell1 [WRN] Health check update: 4586/606262 objects misplaced (0.756%) (OBJECT_MISPLACED)
2018-07-05 14:22:31.819983 mon.dell1 [WRN] Health check update: 4323/606262 objects misplaced (0.713%) (OBJECT_MISPLACED)
2018-07-05 14:22:36.820335 mon.dell1 [WRN] Health check update: 4113/606262 objects misplaced (0.678%) (OBJECT_MISPLACED)
2018-07-05 14:22:41.820676 mon.dell1 [WRN] Health check update: 3949/606262 objects misplaced (0.651%) (OBJECT_MISPLACED)
2018-07-05 14:22:46.821040 mon.dell1 [WRN] Health check update: 3788/606262 objects misplaced (0.625%) (OBJECT_MISPLACED)
2018-07-05 14:22:51.821395 mon.dell1 [WRN] Health check update: 3665/606262 objects misplaced (0.605%) (OBJECT_MISPLACED)
2018-07-05 14:22:56.821692 mon.dell1 [WRN] Health check update: 3440/606262 objects misplaced (0.567%) (OBJECT_MISPLACED)
2018-07-05 14:23:01.821999 mon.dell1 [WRN] Health check update: 3170/606266 objects misplaced (0.523%) (OBJECT_MISPLACED)
2018-07-05 14:23:06.822355 mon.dell1 [WRN] Health check update: 2956/606266 objects misplaced (0.488%) (OBJECT_MISPLACED)
2018-07-05 14:23:11.822752 mon.dell1 [WRN] Health check update: 2747/606270 objects misplaced (0.453%) (OBJECT_MISPLACED)
2018-07-05 14:23:16.823168 mon.dell1 [WRN] Health check update: 2615/606270 objects misplaced (0.431%) (OBJECT_MISPLACED)
2018-07-05 14:23:21.823523 mon.dell1 [WRN] Health check update: 2512/606270 objects misplaced (0.414%) (OBJECT_MISPLACED)
2018-07-05 14:23:26.823878 mon.dell1 [WRN] Health check update: 2409/606270 objects misplaced (0.397%) (OBJECT_MISPLACED)
2018-07-05 14:23:31.824214 mon.dell1 [WRN] Health check update: 2299/606270 objects misplaced (0.379%) (OBJECT_MISPLACED)
2018-07-05 14:23:36.824596 mon.dell1 [WRN] Health check update: 2194/606270 objects misplaced (0.362%) (OBJECT_MISPLACED)
2018-07-05 14:23:41.825037 mon.dell1 [WRN] Health check update: 2101/606270 objects misplaced (0.347%) (OBJECT_MISPLACED)
2018-07-05 14:23:46.825390 mon.dell1 [WRN] Health check update: 1939/606270 objects misplaced (0.320%) (OBJECT_MISPLACED)
2018-07-05 14:23:51.825725 mon.dell1 [WRN] Health check update: 1777/606270 objects misplaced (0.293%) (OBJECT_MISPLACED)
2018-07-05 14:23:56.826087 mon.dell1 [WRN] Health check update: 1612/606270 objects misplaced (0.266%) (OBJECT_MISPLACED)
2018-07-05 14:24:01.826439 mon.dell1 [WRN] Health check update: 1444/606270 objects misplaced (0.238%) (OBJECT_MISPLACED)
2018-07-05 14:24:06.826755 mon.dell1 [WRN] Health check update: 1315/606270 objects misplaced (0.217%) (OBJECT_MISPLACED)
2018-07-05 14:24:11.828343 mon.dell1 [WRN] Health check update: 1264/606270 objects misplaced (0.208%) (OBJECT_MISPLACED)
2018-07-05 14:24:16.828638 mon.dell1 [WRN] Health check update: 1214/606270 objects misplaced (0.200%) (OBJECT_MISPLACED)
2018-07-05 14:24:21.886644 mon.dell1 [WRN] Health check update: 1161/606270 objects misplaced (0.191%) (OBJECT_MISPLACED)
2018-07-05 14:24:26.887027 mon.dell1 [WRN] Health check update: 1110/606270 objects misplaced (0.183%) (OBJECT_MISPLACED)
2018-07-05 14:24:32.287725 mon.dell1 [WRN] Health check update: 1069/606270 objects misplaced (0.176%) (OBJECT_MISPLACED)
2018-07-05 14:24:39.839578 mon.dell1 [WRN] Health check update: 960/606270 objects misplaced (0.158%) (OBJECT_MISPLACED)
2018-07-05 14:24:45.851276 mon.dell1 [WRN] Health check update: 905/606272 objects misplaced (0.149%) (OBJECT_MISPLACED)
2018-07-05 14:24:51.911053 mon.dell1 [WRN] Health check update: 849/606272 objects misplaced (0.140%) (OBJECT_MISPLACED)
2018-07-05 14:24:57.960803 mon.dell1 [WRN] Health check update: 784/606272 objects misplaced (0.129%) (OBJECT_MISPLACED)
2018-07-05 14:25:05.887641 mon.dell1 [WRN] Health check update: 688/606272 objects misplaced (0.113%) (OBJECT_MISPLACED)
2018-07-05 14:25:11.945922 mon.dell1 [WRN] Health check update: 631/606272 objects misplaced (0.104%) (OBJECT_MISPLACED)
2018-07-05 14:25:16.946267 mon.dell1 [WRN] Health check update: 570/606272 objects misplaced (0.094%) (OBJECT_MISPLACED)
2018-07-05 14:25:21.993994 mon.dell1 [WRN] Health check update: 528/606272 objects misplaced (0.087%) (OBJECT_MISPLACED)
2018-07-05 14:25:26.994417 mon.dell1 [WRN] Health check update: 468/606272 objects misplaced (0.077%) (OBJECT_MISPLACED)
2018-07-05 14:25:31.994789 mon.dell1 [WRN] Health check update: 411/606272 objects misplaced (0.068%) (OBJECT_MISPLACED)
2018-07-05 14:25:36.995192 mon.dell1 [WRN] Health check update: 353/606272 objects misplaced (0.058%) (OBJECT_MISPLACED)
2018-07-05 14:25:42.009567 mon.dell1 [WRN] Health check update: 293/606272 objects misplaced (0.048%) (OBJECT_MISPLACED)
2018-07-05 14:25:47.009879 mon.dell1 [WRN] Health check update: 241/606272 objects misplaced (0.040%) (OBJECT_MISPLACED)
2018-07-05 14:25:52.010822 mon.dell1 [WRN] Health check update: 187/606272 objects misplaced (0.031%) (OBJECT_MISPLACED)
2018-07-05 14:25:57.011182 mon.dell1 [WRN] Health check update: 133/606272 objects misplaced (0.022%) (OBJECT_MISPLACED)
2018-07-05 14:26:02.035637 mon.dell1 [WRN] Health check update: 78/606272 objects misplaced (0.013%) (OBJECT_MISPLACED)
2018-07-05 14:26:07.035965 mon.dell1 [WRN] Health check update: 22/606272 objects misplaced (0.004%) (OBJECT_MISPLACED)
2018-07-05 14:26:12.011546 mon.dell1 [INF] Health check cleared: OBJECT_MISPLACED (was: 22/606272 objects misplaced (0.004%))

2.2 PG 修复

但不是数据迁移结束后就万事大吉了,可以通过下面这个命令看到,数据迁移后,有五个 pg 状态不正常,需要修复。

$ ceph health detail
HEALTH_ERR 6 scrub errors; Possible data damage: 5 pgs inconsistent
OSD_SCRUB_ERRORS 6 scrub errors
PG_DAMAGED Possible data damage: 5 pgs inconsistent
    pg 13.cd is active+clean+inconsistent, acting [20,35]
    pg 13.244 is active+clean+inconsistent, acting [35,22]
    pg 13.270 is active+clean+inconsistent, acting [35,14]
    pg 13.308 is active+clean+inconsistent, acting [35,17]
    pg 13.34f is active+clean+inconsistent, acting [11,35]

执行 repair 命令来修复,如果还是不成功,可以使用 scrub 来进行数据清理。

$ ceph pg repair 13.cd
$ ceph pg scrub 13.cd

2.3 关闭 OSD 进程

数据迁移至此算是完成了,但 osd 进程还是跑着的。

 0   hdd   5.52620         osd.0      up        0 1.00000

接下来需要登录到 OSD 服务器上关闭掉该进程。

$ ssh osd8
$ sudo systemctl stop ceph-osd@0

现在 osd 进程的状态已经已经是 down 了。

 0   hdd   5.52620         osd.0    down        0 1.00000 

2.4 删除 OSD

最后执行 purge 命令,将该 osd 从 CRUSH map 中彻底删掉,至此,单个 OSD 的删除终于完成了。

$ ceph osd purge 0 --yes-i-really-mean-it
purged osd.0

对了,最后,如果 /etc/ceph/ceph.conf 中由对应的该 osd 的信息,记得要一起删除。

3. 参考文档

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
阿里云服务器如何登录?阿里云服务器的三种登录方法
购买阿里云ECS云服务器后如何登录?场景不同,大概有三种登录方式:
9094 0
Ceph实验室:第一课:快速部署ceph集群
本次实验演示如何快速地在centos上部署ceph分布式存储集群。目标是在ceph-node1和ceph-node2上创建一个两节点的集群。
7758 0
阿里云服务器怎么设置密码?怎么停机?怎么重启服务器?
如果在创建实例时没有设置密码,或者密码丢失,您可以在控制台上重新设置实例的登录密码。本文仅描述如何在 ECS 管理控制台上修改实例登录密码。
19736 0
阿里云ECS云服务器初始化设置教程方法
阿里云ECS云服务器初始化是指将云服务器系统恢复到最初状态的过程,阿里云的服务器初始化是通过更换系统盘来实现的,是免费的,阿里云百科网分享服务器初始化教程: 服务器初始化教程方法 本文的服务器初始化是指将ECS云服务器系统恢复到最初状态,服务器中的数据也会被清空,所以初始化之前一定要先备份好。
13765 0
阿里云服务器如何登录?阿里云服务器的三种登录方法
购买阿里云ECS云服务器后如何登录?场景不同,云吞铺子总结大概有三种登录方式: 登录到ECS云服务器控制台 在ECS云服务器控制台用户可以更改密码、更换系统盘、创建快照、配置安全组等操作如何登录ECS云服务器控制台? 1、先登录到阿里云ECS服务器控制台 2、点击顶部的“控制台” 3、通过左侧栏,切换到“云服务器ECS”即可,如下图所示 通过ECS控制台的远程连接来登录到云服务器 阿里云ECS云服务器自带远程连接功能,使用该功能可以登录到云服务器,简单且方便,如下图:点击“远程连接”,第一次连接会自动生成6位数字密码,输入密码即可登录到云服务器上。
32770 0
阿里云服务器端口号设置
阿里云服务器初级使用者可能面临的问题之一. 使用tomcat或者其他服务器软件设置端口号后,比如 一些不是默认的, mysql的 3306, mssql的1433,有时候打不开网页, 原因是没有在ecs安全组去设置这个端口号. 解决: 点击ecs下网络和安全下的安全组 在弹出的安全组中,如果没有就新建安全组,然后点击配置规则 最后如上图点击添加...或快速创建.   have fun!  将编程看作是一门艺术,而不单单是个技术。
18005 0
阿里云服务器如何登录?阿里云服务器的三种登录方法
购买阿里云ECS云服务器后如何登录?场景不同,阿里云优惠总结大概有三种登录方式: 登录到ECS云服务器控制台 在ECS云服务器控制台用户可以更改密码、更换系.
24812 0
+关注
blackpiglet
云原生技术爱好者。
42
文章
0
问答
文章排行榜
最热
最新
相关电子书
更多
OceanBase 入门到实战教程
立即下载
阿里云图数据库GDB,加速开启“图智”未来.ppt
立即下载
实时数仓Hologres技术实战一本通2.0版(下)
立即下载