安装rook 1.6 版本(截至2021-11-15 最新版本为1.7)
rook 官方文档 : https://rook.github.io/docs/rook/v1.6/
一、准备工作
关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
关闭selinux(linux的安全子系统)
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0
设置主机名,分别把三台虚拟机的主机名设置成想设置的名称,本文为obptest2,obptest3,obptest4。
hostnamectl set-hostname obptest2
hostnamectl set-hostname obptest3
hostnamectl set-hostname obptest4
SSH免密登陆配置
在三个节点上分别执行下列命令配置host,需将ip与主机名替换为自己服务器的
cat >> /etc/hosts <<EOF
10.169.136.38 obptest2
10.169.136.39 obptest3
10.169.136.40 obptest4
EOF
在主节点obptest2配置免密登录到obptest3和obptest4 下面命令在主节点obptest2上执行(另外两台机器也要免密配置)。
ssh-keygen
#把密钥发给obptest3、obptest4
ssh-copy-id obptest3
ssh-copy-id obptest4
使用docker将 需要的镜像拉下来
# 主要的镜像
docker pull ceph/ceph:v15.2.13
docker pull rook/ceph:v1.6.10
# 其它需要的镜像
docker pull registry.aliyuncs.com/it00021hot/cephcsi:v3.3.1
docker pull registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.2.0
docker pull registry.aliyuncs.com/it00021hot/csi-resizer:v1.2.0
docker pull registry.aliyuncs.com/it00021hot/csi-provisioner:v2.2.2
docker pull registry.aliyuncs.com/it00021hot/csi-snapshotter:v4.1.1
docker pull registry.aliyuncs.com/it00021hot/csi-attacher:v3.2.1
进行修改镜像名
docker tag ceph/ceph:v15.2.13 10.169.136.38:10082/public/ceph/ceph:v15.2.13
docker tag rook/ceph:v1.6.10 10.169.136.38:10082/public/rook/ceph:v1.6.10
docker tag registry.aliyuncs.com/it00021hot/cephcsi:v3.3.1 10.169.136.38:10082/public/cephcsi/cephcsi:v3.3.1
docker tag registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.2.0 10.169.136.38:10082/public/sig-storage/csi-node-driver-registrar:v2.2.0
docker tag registry.aliyuncs.com/it00021hot/csi-resizer:v1.2.0 10.169.136.38:10082/public/sig-storage/csi-resizer:v1.2.0
docker tag registry.aliyuncs.com/it00021hot/csi-provisioner:v2.2.2 10.169.136.38:10082/public/sig-storage/csi-provisioner:v2.2.2
docker tag registry.aliyuncs.com/it00021hot/csi-snapshotter:v4.1.1 10.169.136.38:10082/public/sig-storage/csi-snapshotter:v4.1.1
docker tag registry.aliyuncs.com/it00021hot/csi-attacher:v3.2.1 10.169.136.38:10082/public/sig-storage/csi-attacher:v3.2.1
将镜像推到harbor上,以供内网机器使用(docker如果权限不足,请自行配置 root/.docker/config.json 文件)。
docker push 10.169.136.38:10082/public/ceph/ceph:v15.2.13
docker push 10.169.136.38:10082/public/rook/ceph:v1.6.10
docker push 10.169.136.38:10082/public/cephcsi/cephcsi:v3.3.1
docker push 10.169.136.38:10082/public/sig-storage/csi-node-driver-registrar:v2.2.0
docker push 10.169.136.38:10082/public/sig-storage/csi-resizer:v1.2.0
docker push 10.169.136.38:10082/public/sig-storage/csi-provisioner:v2.2.2
docker push 10.169.136.38:10082/public/sig-storage/csi-snapshotter:v4.1.1
docker push 10.169.136.38:10082/public/sig-storage/csi-attacher:v3.2.1
线上服务器需要准备好空磁盘
使用这个命令可以检查是否磁盘有 挂载/系统lsblk -f
类似下面sdb2,3,4 这样的
二、开始安装
通过git拉取rook源码git clone https://github.com/rook/rook.git
修改 rook\cluster\examples\kubernetes\ceph\operator.yaml 文件
#修改默认的csi镜像包地址
ROOK_CSI_CEPH_IMAGE: "10.169.136.38:10082/public/cephcsi/cephcsi:v3.3.1"
ROOK_CSI_REGISTRAR_IMAGE: "10.169.136.38:10082/public/sig-storage/csi-node-driver-registrar:v2.2.0"
ROOK_CSI_RESIZER_IMAGE: "10.169.136.38:10082/public/sig-storage/csi-resizer:v1.2.0"
ROOK_CSI_PROVISIONER_IMAGE: "10.169.136.38:10082/public/sig-storage/csi-provisioner:v2.2.2"
ROOK_CSI_SNAPSHOTTER_IMAGE: "10.169.136.38:10082/public/sig-storage/csi-snapshotter:v4.1.1"
ROOK_CSI_ATTACHER_IMAGE: "10.169.136.38:10082/public/sig-storage/csi-attacher:v3.2.1"
# 修改主镜像地址
image: 10.169.136.38:10082/public/rook/ceph:v1.6.10
在k8s 主节点 服务器上创建 一个文件夹用来存放yaml文件,上传 crds.yaml common.yaml operator.yaml 这三个文件
mkdir /app/rook-yaml
cd /app/rook-yaml
step 1 -- > 安装operator 模块
运行命令 安装operator 模块
cd /app/rook-yaml
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
在继续操作之前,验证 rook-ceph-operator 是否处于“Running”状态:kubectl get pod -n rook-ceph -o wide
step 2 -- > 安装cluster 集群
修改 rook\cluster\examples\kubernetes\ceph\cluster.yaml 文件
# 修改默认镜像
image: 10.169.136.38:10082/public/ceph/ceph:v15.2.13
# 修改映射配置文件等的地址
dataDirHostPath: /app/rook-ceph/
# dashboard 是否使用https
# serve the dashboard using SSL
ssl: false
# 配置污点容忍
placement:
all:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - storage-node
# podAffinity:
# podAntiAffinity:
# topologySpreadConstraints:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: "NoSchedule"
这部分配置可以根据需求调整 , 不需要则跳过这块
# 要生成几个mon节点
mon:
# Set the number of mons to be started. Must be an odd number, and is generally recommended to be 3.
count: 3
# 要生成几个mgr节点(多个mgr节点会强制使用域名映射)
mgr:
# When higher availability of the mgr is needed, increase the count to 2.
# In that case, one mgr will be active and one in standby. When Ceph updates which
# mgr is active, Rook will update the mgr services to match the active mgr.
count: 1
# 要对哪个节点下的哪块磁盘进行使用
storage: # cluster level storage configuration and selection
# 是否使用全部节点(自动在所有节点上安装cluster组件)
useAllNodes: true
# 是否使用全部设备(自动探测目标节点上的所有未格式化的磁盘,并将磁盘使用)
useAllDevices: true
#deviceFilter:
config:
# crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
# metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
# databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
# journalSizeMB: "1024" # uncomment if the disks are 20 GB or smaller
# osdsPerDevice: "1" # this value can be overridden at the node or device level
# encryptedDevice: "true" # the default value for this option is "false"
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.
# 具体的节点配置,节点名可以写ip也可以写hostname , 磁盘名仅支持磁盘和分区磁盘,不支持lvm和loop虚拟磁盘。
# nodes:
# - name: "172.17.4.201"
# devices: # specific devices to use for storage can be specified for each node
# - name: "sdb"
# - name: "nvme01" # multiple osds can be created on high performance devices
# config:
# osdsPerDevice: "5"
# - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
# config: # configuration can be specified at the node level which overrides the cluster level config
# - name: "172.17.4.301"
# deviceFilter: "^sd."
运行 命令,进行cluster 安装
kubectl create -f cluster.yaml
使用命令,检查,类似下图中,mgr,mon,ods 都装好了, 即 正常安装完成kubectl get pod -n rook-ceph -o wide
step 3 -- > 安装toolbox
修改toolbox.yaml 文件
# 修改镜像
image: 10.169.136.38:10082/public/rook/ceph:v1.6.10
进行安装,并检查是否成功安装
kubectl create -f toolbox.yaml
// 查看tool是否安装成功
kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"
使用下面的命令, 进入toolbox 工具界面
// 进入工具模式
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') sh
// 查看ceph集群运行状态
ceph status
// 查看ceph集群ods
ceph osd status
// 查看ceph集群mon
ceph mon status
// 退出工具模式
exit
step 4 -- > 生成共享文件系统
将 filesystem.yaml 进行修改(也可不修改),并传到服务器上
副本的属性配置,要看有几台机器,如果只有2台机器,却配置了3个副本,则会health warn
建议非生产环境副本数 >=2 生产环境 》=3
replicated:
size: 2
运行命令,生成文件系统。
kubectl create -f filesystem.yaml
# 确认下是否文件系统正常生成
# To confirm the filesystem is configured, wait for the mds pods to start
kubectl -n rook-ceph get pod -l app=rook-ceph-mds
可以通过查生成的配置文件,得到对应的id和端口。从而配置挂载共享文件系统到本地。
# 查看配置文件
[root@obptest2 rook-ceph]# cat /app/rook-ceph/rook-ceph/ rook-ceph.config
# 尝试进行挂载
[root@obptest2 rook-ceph]# mount -t ceph 10.96.87.195:6789,10.96.183.53:6789,10.96.22.18:6789:/ /rook_mnt/ -o name=admin,secret=AQDlLo5hFTuLEhAA+yiK3lN7GpP/BNEJcIkjvw==,mds_namespace=myfs
step 5 -- > 配置并登陆 Ceph Dashboard
默认会生成http - 7000端口, https - 8443 端口,需要映射出来才能使用
# 装http的dashboard 外部生成一个端口
kubectl create -f dashboard-external-http.yaml
// 查看 dashboard 服务,找到自动生成的外部映射的端口
kubectl -n rook-ceph get service
# 获取登陆密码,https协议访问集群node相应的端口,登陆用户名为admin
Ciphertext=$(kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}")
Pass=$(echo ${Ciphertext}|base64 --decode)
echo ${Pass}
三、错误处理
如果安装中遇到问题可以通过查找pod id, 再通过logs命令查看容器日志,日志中如果没有找到错误原因,还可以describe 查看pod的问题。
# 获取namespace下的全部pod
kubectl get pod -n rook-ceph -o wide
# 查询pod 的log
kubectl -n rook-ceph logs rook-ceph-osd-prepare-obptest2--1-p5sgn
# 查询pod 的详情
kubectl -n rook-ceph describe pod rook-ceph-osd-prepare-obptest2--1-p5sgn
卸载rook-ceph流程
# 按照安装时的配置进行卸载
cd /app/rook-yaml
kubectl delete -f toolbox.yaml
kubectl delete -f cluster.yaml
kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yaml
# 如果有些情况下卸载不掉,则加上强制卸载命令 --grace-period=0 --force
kubectl delete -f cluster.yaml --grace-period=0 --force
如果已经安装过一次cluster 集群,再次重装会发现osd无法自动发现并启动子发现的osd 的 pod,原因是之前的cluster 信息被写入了磁盘系统,需要清除到这部分信息。
创建一个 cleanup.sh 的脚本, 脚本根据实际情况修改需要清空的磁盘。
#!/usr/bin/env bash
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
# You will have to run this step for all disks.
#清除磁盘中的链接关联信息
sgdisk --zap-all /dev/sdb2
sgdisk --zap-all /dev/sdb3
sgdisk --zap-all /dev/sdb4
sgdisk --zap-all /dev/sdb5
# Clean hdds with dd
# 擦除磁盘的系统 ,
# ** 此命令危险,需要确认好再执行 **
dd if=/dev/zero of="/dev/sdb2" bs=1M count=100 oflag=direct,dsync
dd if=/dev/zero of="/dev/sdb3" bs=1M count=100 oflag=direct,dsync
dd if=/dev/zero of="/dev/sdb4" bs=1M count=100 oflag=direct,dsync
dd if=/dev/zero of="/dev/sdb5" bs=1M count=100 oflag=direct,dsync
# Clean disks such as ssd with blkdiscard instead of dd
blkdiscard /dev/sdb2
blkdiscard /dev/sdb3
blkdiscard /dev/sdb4
blkdiscard /dev/sdb5
# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
# ceph-volume setup can leave ceph-<UUID> directories in /dev and /dev/mapper (unnecessary clutter)
rm -rf /dev/ceph-*
rm -rf /dev/mapper/ceph--*
# Inform the OS of partition table changes
partprobe /dev/sdb2
partprobe /dev/sdb3
partprobe /dev/sdb4
partprobe /dev/sdb5
# 这里清除的是pod内部挂载到外部的文件信息
rm -rf /app/rook-ceph/
# 这里是pod默认会生成的一些信息。
rm -rf /var/lib/rook/
确认需要清除,且清除脚本确认无误后可以执行脚本进行清除了
清除脚本很危险, 如果错误清除,可能会导致磁盘系统崩溃,请确认无误后再执行此脚本
# 执行清除脚本
bash cleanup.sh
参考文档:
rook 官方文档 : https://rook.github.io/docs/rook/v1.6/
Kubernetes使用Rook部署Ceph存储集群: https://www.cnblogs.com/bugutian/p/13092189.html
Rook 1.5.1 部署Ceph实操经验分享: https://blog.csdn.net/vic_qxz/article/details/119513077