Ceph Jewel 手动升级Luminous
测试环境
节点IP |
节点功能 |
192.168.1.10 |
mon,osd,rgw |
192.168.1.11 |
mon,osd,rgw |
192.168.1.12 |
mon,osd,rgw |
测试准备
1,配置升级Luminous的yum源
# cat ceph-luminous.repo
[ceph]
name=x86_64
baseurl=https://mirrors.aliyun.com/ceph/rpm-luminous/el7/x86_64/
gpgcheck=0
[ceph-noarch]
name=noarch
baseurl=https://mirrors.aliyun.com/ceph/rpm-luminous/el7/noarch/
gpgcheck=0
[ceph-arrch64]
name=arrch64
baseurl=https://mirrors.aliyun.com/ceph/rpm-luminous/el7/aarch64/
gpgcheck=0
[ceph-SRPMS]
name=SRPMS
baseurl=https://mirrors.aliyun.com/ceph/rpm-luminous/el7/SRPMS/
gpgcheck=0
把生成的Luminous源文件拷贝产品产品到每一个节点上,并删除原本的jewel版yum源
# ansible node -m copy -a 'src=ceph-luminous.repo dest=/etc/yum.repos.d/ceph-luminous.repo'
# ansible node -m file -a 'name=/etc/yum.repos.d/ceph-jewel.repo state=absent'
2,设置sortbitwis
如果未设置,升级过程中可能会出现数据丢失的情况
# ceph osd set sortbitwise
3,设置noout
为了防止升级过程中出现数据重平衡,升级完成后取消设置即可
# ceph osd set noout
设置完成后集群状态如下
# ceph -s
cluster 0d5eced9-8baa-48be-83ef-64a7ef3a8301
health HEALTH_WARN
noout flag(s) set
monmap e1: 3 mons at {node1=192.168.1.10:6789/0,node2=192.168.1.11:6789/0,node3=192.168.1.12:6789/0}
election epoch 26, quorum 0,1,2 node1,node2,node3
osdmap e87: 9 osds: 9 up, 9 in
flags noout,sortbitwise,require_jewel_osds
pgmap v267: 112 pgs, 7 pools, 3084 bytes data, 173 objects
983 MB used, 133 GB / 134 GB avail
112 active+clean
4,Luminous版的ceph需要指定允许池删除的参数,在每个mon节点的ceph配置文件中添加“mon allow pool delete = true”
# ansible node -m shell -a 'echo "mon allow pool delete = true" >> /etc/ceph/ceph.conf'
开始升级
1,确认当前集群中安装的ceph软件包版本
# ansible node -m shell -a 'rpm -qa | grep ceph'
[WARNING]: Consider using yum, dnf or zypper module rather than running rpm
node1 | SUCCESS | rc=0 >>
ceph-selinux-10.2.11-0.el7.x86_64
ceph-10.2.11-0.el7.x86_64
ceph-deploy-1.5.39-0.noarch
libcephfs1-10.2.11-0.el7.x86_64
python-cephfs-10.2.11-0.el7.x86_64
ceph-base-10.2.11-0.el7.x86_64
ceph-mon-10.2.11-0.el7.x86_64
ceph-osd-10.2.11-0.el7.x86_64
ceph-radosgw-10.2.11-0.el7.x86_64
ceph-common-10.2.11-0.el7.x86_64
ceph-mds-10.2.11-0.el7.x86_64
node3 | SUCCESS | rc=0 >>
ceph-mon-10.2.11-0.el7.x86_64
ceph-radosgw-10.2.11-0.el7.x86_64
ceph-common-10.2.11-0.el7.x86_64
libcephfs1-10.2.11-0.el7.x86_64
python-cephfs-10.2.11-0.el7.x86_64
ceph-selinux-10.2.11-0.el7.x86_64
ceph-mds-10.2.11-0.el7.x86_64
ceph-10.2.11-0.el7.x86_64
ceph-base-10.2.11-0.el7.x86_64
ceph-osd-10.2.11-0.el7.x86_64
node2 | SUCCESS | rc=0 >>
ceph-mds-10.2.11-0.el7.x86_64
python-cephfs-10.2.11-0.el7.x86_64
ceph-base-10.2.11-0.el7.x86_64
ceph-mon-10.2.11-0.el7.x86_64
ceph-osd-10.2.11-0.el7.x86_64
ceph-radosgw-10.2.11-0.el7.x86_64
ceph-common-10.2.11-0.el7.x86_64
ceph-selinux-10.2.11-0.el7.x86_64
ceph-10.2.11-0.el7.x86_64
libcephfs1-10.2.11-0.el7.x86_64
2,确认当前集群使用的ceph版本
# ansible node -m shell -a 'for i in `ls /var/run/ceph/ | grep "ceph-mon.*asok"` ; do ceph --admin-daemon /var/run/ceph/$i --version ; done'
node1 | SUCCESS | rc=0 >>
ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)
node2 | SUCCESS | rc=0 >>
ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)
node3 | SUCCESS | rc=0 >>
ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)
3,升级软件包
# ansible node -m yum -a 'name=ceph state=latest'
4,升级完成后,查看当前集群节点中安装的软件包版本
# ansible node -m shell -a 'rpm -qa | grep ceph'
[WARNING]: Consider using yum, dnf or zypper module rather than running rpm
node2 | SUCCESS | rc=0 >>
ceph-base-12.2.10-0.el7.x86_64
ceph-osd-12.2.10-0.el7.x86_64
python-cephfs-12.2.10-0.el7.x86_64
ceph-common-12.2.10-0.el7.x86_64
ceph-selinux-12.2.10-0.el7.x86_64
ceph-mon-12.2.10-0.el7.x86_64
ceph-mds-12.2.10-0.el7.x86_64
ceph-radosgw-12.2.10-0.el7.x86_64
libcephfs2-12.2.10-0.el7.x86_64
ceph-mgr-12.2.10-0.el7.x86_64
ceph-12.2.10-0.el7.x86_64
node1 | SUCCESS | rc=0 >>
ceph-base-12.2.10-0.el7.x86_64
ceph-osd-12.2.10-0.el7.x86_64
ceph-deploy-1.5.39-0.noarch
python-cephfs-12.2.10-0.el7.x86_64
ceph-common-12.2.10-0.el7.x86_64
ceph-selinux-12.2.10-0.el7.x86_64
ceph-mon-12.2.10-0.el7.x86_64
ceph-mds-12.2.10-0.el7.x86_64
ceph-radosgw-12.2.10-0.el7.x86_64
libcephfs2-12.2.10-0.el7.x86_64
ceph-mgr-12.2.10-0.el7.x86_64
ceph-12.2.10-0.el7.x86_64
node3 | SUCCESS | rc=0 >>
python-cephfs-12.2.10-0.el7.x86_64
ceph-common-12.2.10-0.el7.x86_64
ceph-mon-12.2.10-0.el7.x86_64
ceph-radosgw-12.2.10-0.el7.x86_64
libcephfs2-12.2.10-0.el7.x86_64
ceph-base-12.2.10-0.el7.x86_64
ceph-mgr-12.2.10-0.el7.x86_64
ceph-osd-12.2.10-0.el7.x86_64
ceph-12.2.10-0.el7.x86_64
ceph-selinux-12.2.10-0.el7.x86_64
ceph-mds-12.2.10-0.el7.x86_64
5,分别对所有的mon,osd,rgw进程进行重启
node1节点
# systemctl restart ceph-mon@node1
# systemctl restart ceph-osd@{0,1,2}
# systemctl restart ceph-radosgw@rgw.node1
node2节点
# systemctl restart ceph-mon@node2
# systemctl restart ceph-osd@{3,4,5}
# systemctl restart ceph-radosgw@rgw.node2
node3节点
# systemctl restart ceph-mon@node3
# systemctl restart ceph-osd@{6,7,8}
# systemctl restart ceph-radosgw@rgw.node3
6,调整require_osd_release
此时查看集群状态信息如下
# ceph -s
cluster:
id: 0d5eced9-8baa-48be-83ef-64a7ef3a8301
health: HEALTH_WARN
noout flag(s) set
all OSDs are running luminous or later but require_osd_release < luminous
no active mgr
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: no daemons active
osd: 9 osds: 9 up, 9 in
flags noout
data:
pools: 7 pools, 112 pgs
objects: 189 objects, 3.01KiB
usage: 986MiB used, 134GiB / 135GiB avail
pgs: 112 active+clean
需要手动调整require_osd_release
# ceph osd require-osd-release luminous
7,取消noout设置
# ceph osd unset noout
再次查看集群状态如下
# ceph -s
cluster:
id: 0d5eced9-8baa-48be-83ef-64a7ef3a8301
health: HEALTH_WARN
no active mgr
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: no daemons active
osd: 9 osds: 9 up, 9 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0B
usage: 0B used, 0B / 0B avail
pgs:
8,配置mgr
1)生成密钥
# ceph auth get-or-create mgr.node1 mon 'allow *' osd 'allow *'
[mgr.node1]
key = AQC0IA9c9X31IhAAdQRm3zR5r/nl3b7+WOwZjQ==
2)创建数据目录
# mkdir /var/lib/ceph/mgr/ceph-node1/
3)添加密钥
# ceph auth get mgr.node1 -o /var/lib/ceph/mgr/ceph-node1/keyring
exported keyring for mgr.node1
4)设置服务开机自启
# systemctl enable ceph-mgr@node1
Created symlink from /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@node1.service to /usr/lib/systemd/system/ceph-mgr@.service.
5)启动mgr
# systemctl start ceph-mgr@node1
6)其他mon节点通过同样的方式配置一下mgr,再次查看集群状态
# ceph -s
cluster:
id: 0d5eced9-8baa-48be-83ef-64a7ef3a8301
health: HEALTH_OK
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: node1(active), standbys: node2, node3
osd: 9 osds: 9 up, 9 in
rgw: 3 daemons active
data:
pools: 7 pools, 112 pgs
objects: 189 objects, 3.01KiB
usage: 986MiB used, 134GiB / 135GiB avail
pgs: 112 active+clean
7)开启mgr的dashboard模块,dashboard提供一个web界面可以对集群状态进行监控
# ceph mgr module enable dashboard
# ceph mgr module ls
{
"enabled_modules": [
"balancer",
"dashboard",
"restful",
"status"
],
"disabled_modules": [
"influx",
"localpool",
"prometheus",
"selftest",
"zabbix"
]
}
# ceph mgr services
{
"dashboard": "http://node1:7000/"
}
8)访问dashboard
使用deploy升级集群
如果ceph集群使用的deploy部署,也可以通过部署进行升级,软件包的升级命令如下,其他的操作步骤都是类似的,这里不再赘述。
#ceph-deploy install --release lumious node1 node2 node3
#ceph-deploy --overwrite-conf mgr create node1 node2 node3