一.存储池概述
1.存储池的类型
- replicated pools
副本池,将数据存储为3副本,也就是说数据默认会被存储为3份。
当然,创建存储池时可以修改对应的副本数量。
- erasure-coded pools
相比于副本池并不会将数据存储3份,而是基于纠删码技术节省存储空间,达到数据冗余的效果。
温馨提示:
如果非要类比的话,那么副本池和纠删码池可以类型"RAID 1"和"RAID 5"。
生产环境中,大多数情况下都会选择"replicated pools"进行创建,纠删码不敢用。
参考链接:
https://docs.ceph.com/en/latest/rados/operations/pools/#pools
2.PG数量的计算公式
OSD的数量 X 100 / 存储池的副本数量(osd_pool_default_size,默认值为3) ----》 2N次幂就是对应的PG数量。
咱们的环境:
ceph141有3个osd设备:
[root@ceph141 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
..
sdb 8:16 0 200G 0 disk
└─ceph--72aef53e--0a69--4aa5--8be3--9239bc333ec2-osd--block--23387ffb--9b97--4eef--8b77--22b728069b1e
253:2 0 200G 0 lvm
sdc 8:32 0 300G 0 disk
└─ceph--313a6cda--6b9d--4796--9668--8d1da63cd1b4-osd--block--7107cd5e--5a71--46cd--94fe--ab7cc8c779b9
253:3 0 300G 0 lvm
sdd 8:48 0 500G 0 disk
└─ceph--84a0e3f4--7fae--446e--89ad--1bb22b6940ab-osd--block--6cfeedfc--e870--4ab6--bb04--1085c674a9ab
253:4 0 500G 0 lvm
...
[root@ceph141 ~]#
ceph142有3个osd设备:
[root@ceph142 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
sdb 8:16 0 200G 0 disk
└─ceph--223b39c1--89c2--440d--9dce--930450aaad7d-osd--block--183644a5--5af8--4387--b995--51ad8419ba82
253:2 0 200G 0 lvm
sdc 8:32 0 300G 0 disk
└─ceph--72aafaac--5151--49f4--aa4a--b0216f1a33b7-osd--block--674f0f7b--cf54--4813--a486--f92a6d6fe30f
253:3 0 300G 0 lvm
sdd 8:48 0 500G 0 disk
└─ceph--c019c813--5e99--41d2--923b--6c68bc6a87c7-osd--block--636e7599--9338--4b57--989b--d04d1d951322
253:4 0 500G 0 lvm
..
[root@ceph142 ~]#
ceph143有2个osd设备:
[root@ceph143 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
sdb 8:16 0 200G 0 disk
└─ceph--2f9b8018--7242--4eae--9b89--454c56222d72-osd--block--e2ee73ae--c94e--4bb8--a0c4--ab24f7654237
253:2 0 200G 0 lvm
sdc 8:32 0 300G 0 disk
└─ceph--a72237c7--f9ec--4228--a3f3--1b4d5625fb62-osd--block--04eb39e9--1dc6--4446--930c--1c2434674b1e
253:3 0 300G 0 lvm
...
[root@ceph143 ~]#
综上所述,统计咱们的环境如下:
操作OSD设备是:8个
存储池的默认副本: 3个
因此适合咱们的PG数量为:
100 * 8 / 3 => <270
2的8次方得到的结果是256.
2的9次方得到的结果是512。
很明显远大于270,因此推荐设置pgs数量的为256。
参考链接:
https://docs.ceph.com/en/latest/rados/configuration/pool-pg-config-ref/#pool-pg-and-crush-config-reference
https://docs.ceph.com/en/nautilus/rados/configuration/pool-pg-config-ref/
二.存储池的基本管理
1.创建存储池
1.创建副本池
语法:
ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] [replicated] \
[crush-rule-name] [expected-num-objects]
参考案例:
[root@ceph141 ~]# ceph osd pool create yinzhengjie 128 128 replicated
pool 'yinzhengjie' created
[root@ceph141 ~]#
2.创建纠删码池
语法:
ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] erasure \
[erasure-code-profile] [crush-rule-name] [expected_num_objects] [--autoscale-mode=<on,off,warn>]
参考案例:
[root@ceph141 ~]# ceph osd pool create jasonyin 128 128 erasure
pool 'jasonyin' created
[root@ceph141 ~]#
温馨提示: pg_num和pgp-num的区别:
- pg_num:
创建pg的数量。
启用pg自动缩放时,允许集群根据预期集群利用率和预期池利用率,对每个池的pg数量(pgp_num)进行建议或自动调整。
参考链接:
https://docs.ceph.com/en/latest/rados/operations/placement-groups/#autoscaling-placement-groups
- pgp-num:
用于放置目的的PG的总数。这应该等于pg的总数,除非在pg_num增加或减少时短暂增加或减少。
参考链接:
https://docs.ceph.com/en/latest/rados/operations/pools/#creating-a-pool
2.查看存储池
1. 查看存储池的名称列表
[root@ceph141 ~]# ceph osd pool ls
yinzhengjie
jasonyin
[root@ceph141 ~]#
2. 查看存储池的列表详细信息
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 32 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
[root@ceph141 ~]#
3 查看存储池的名称列表并显示存储池的编号
[root@ceph141 ~]# ceph osd lspools
1 yinzhengjie
2 jasonyin
[root@ceph141 ~]#
4 查看存储池的使用空间
[root@ceph141 ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
jasonyin 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
yinzhengjie 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
total_objects 0
total_used 7.0 GiB
total_avail 1.9 TiB
total_space 2.0 TiB
[root@ceph141 ~]#
5 查看指定存储池的I/O拷贝信息
[root@ceph141 ~]# ceph osd pool stats yinzhengjie
pool yinzhengjie id 1
nothing is going on
[root@ceph141 ~]#
6 查看OSD的信息也可以看到存储池相关的信息
[root@ceph141 ~]# ceph osd dump | grep pool
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 511 lfor 0/509/507 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd dump
epoch 512
fsid 5821e29c-326d-434d-a5b6-c492527eeaad
created 2024-01-31 17:46:11.238910
modified 2024-02-01 11:02:20.375752
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 16
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release nautilus
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 511 lfor 0/509/507 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
max_osd 7
osd.0 up in weight 1 up_from 5 up_thru 372 down_at 0 last_clean_interval [0,0) [v2:10.0.0.141:6800/2833,v1:10.0.0.141:6801/2833] [v2:10.0.0.141:6802/2833,v1:10.0.0.141:6803/2833] exists,up 2e6612cc-fa0e-403b-9ea0-3023e6c536c6
osd.1 up in weight 1 up_from 9 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.141:6808/3299,v1:10.0.0.141:6809/3299] [v2:10.0.0.141:6810/3299,v1:10.0.0.141:6811/3299] exists,up ee7ad091-20a7-4600-a94a-9c0281f8e79f
osd.2 up in weight 1 up_from 13 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.142:6800/18107,v1:10.0.0.142:6801/18107] [v2:10.0.0.142:6802/18107,v1:10.0.0.142:6803/18107] exists,up 66310a40-46eb-4e47-8706-4ebc455c161d
osd.3 up in weight 1 up_from 17 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.142:6808/18572,v1:10.0.0.142:6809/18572] [v2:10.0.0.142:6810/18572,v1:10.0.0.142:6811/18572] exists,up 3003810f-42ee-4a6d-bd5c-8878b9f2a307
osd.4 up in weight 1 up_from 21 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.142:6816/19035,v1:10.0.0.142:6817/19035] [v2:10.0.0.142:6818/19035,v1:10.0.0.142:6819/19035] exists,up 0f234c3b-a0b9-4912-a351-f0d39ae93834
osd.5 up in weight 1 up_from 25 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.143:6800/12844,v1:10.0.0.143:6801/12844] [v2:10.0.0.143:6802/12844,v1:10.0.0.143:6803/12844] exists,up 4c34a506-2fa0-47ad-9f01-1080d389dcd3
osd.6 up in weight 1 up_from 29 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.143:6808/13302,v1:10.0.0.143:6809/13302] [v2:10.0.0.143:6810/13302,v1:10.0.0.143:6811/13302] exists,up 4a6082bc-ba84-41f3-94d9-daff6942517f
[root@ceph141 ~]#
参考链接:
https://docs.ceph.com/en/nautilus/rados/operations/pools/
https://docs.ceph.com/en/latest/rados/operations/pools/#list-pools
https://docs.ceph.com/en/latest/rados/operations/pools/#showing-pool-statistics
https://docs.ceph.com/en/nautilus/rados/operations/pools/#get-the-number-of-object-replicas
3.修改存储池信息
1 查看存储池的指定属性
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 32 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool get yinzhengjie size
size: 3
[root@ceph141 ~]#
2 修改存储池的指定属性
[root@ceph141 ~]# ceph osd pool set yinzhengjie size 1
set pool 1 size to 1
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool get yinzhengjie size
size: 1
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 'yinzhengjie' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 37 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
[root@ceph141 ~]#
3 禁止pg数量自动伸缩
[root@ceph141 ~]# ceph osd pool get yinzhengjie pg_autoscale_mode
pg_autoscale_mode: warn
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool set yinzhengjie pg_autoscale_mode off
set pool 1 pg_autoscale_mode to off
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool get yinzhengjie pg_autoscale_mode
pg_autoscale_mode: off
[root@ceph141 ~]#
4 修改pg数量
[root@ceph141 ~]# ceph osd pool set yinzhengjie pg_num 16
set pool 1 pg_num to 16
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool ls detail # 注意,在达到目标的pg数量前,会有一个"pg_num_target"和"pgp_num_target"属性。
pool 1 'yinzhengjie' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 25 pgp_num 24 pg_num_target 16 pgp_num_target 16 last_change 470 lfor 0/470/468 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 'yinzhengjie' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 509 lfor 0/509/507 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool get yinzhengjie pg_num
pg_num: 16
[root@ceph141 ~]#
5 查看修改的信息
[root@ceph141 ~]# ceph osd pool set yinzhengjie size 3
set pool 1 size to 3
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd dump | grep 'replicated size'
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 511 lfor 0/509/507 flags hashpspool stripe_width 0
[root@ceph141 ~]#
参考链接:
https://docs.ceph.com/en/latest/rados/operations/pools/#getting-pool-values
https://docs.ceph.com/en/latest/rados/operations/pools/#setting-pool-values
https://docs.ceph.com/en/nautilus/rados/operations/placement-groups/
4.删除存储池的两种机制
温馨提示:
一旦一个存储池被删除,那么该存储池的所有数据都会被删除且无法找回。
因此为了安全起见,ceph有存储池保护机制,ceph支持两种保护机制: "nodelete"和"mon_allow_pool_delete"
- nodelete:
一旦一个存储池被打上该标记,则意味着存储池不可被删除,默认值为false。
- mon_allow_pool_delete:
告诉所有mon组件,可以删除存储池。
生产环境中,为了安全起见,建议大家将存储池设置为nodelete的属性为"ture",mon_allow_pool_delete的值为false。
参考案例:
1. nodelete案例
[root@ceph141 ~]# ceph osd pool ls
yinzhengjie
jasonyin
[root@ceph141 ~]# ceph osd pool get yinzhengjie nodelete
nodelete: false
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool get jasonyin nodelete
nodelete: false
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool set yinzhengjie nodelete true
set pool 1 nodelete to true
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool get yinzhengjie nodelete
nodelete: true
[root@ceph141 ~]#
2 mon_allow_pool_delete案例
[root@ceph141 ~]# ceph osd pool ls
yinzhengjie
jasonyin
[root@ceph141 ~]#
[root@ceph141 ~]# ceph tell mon.* injectargs --mon_allow_pool_delete=true
mon.ceph141: injectargs:mon_allow_pool_delete = 'true'
mon.ceph142: injectargs:mon_allow_pool_delete = 'true'
mon.ceph143: injectargs:mon_allow_pool_delete = 'true'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool delete yinzhengjie yinzhengjie --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must unset nodelete flag for the pool first
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool delete jasonyin jasonyin --yes-i-really-really-mean-it
pool 'jasonyin' removed
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool ls
yinzhengjie
[root@ceph141 ~]#
3 如果想要删除存储池必须让nodelete的值为false,且mon_allow_pool_delete为true。
[root@ceph141 ~]# ceph tell mon.* injectargs --mon_allow_pool_delete=false
mon.ceph141: injectargs:mon_allow_pool_delete = 'false'
mon.ceph142: injectargs:mon_allow_pool_delete = 'false'
mon.ceph143: injectargs:mon_allow_pool_delete = 'false'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool delete yinzhengjie yinzhengjie --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool set yinzhengjie nodelete false
set pool 1 nodelete to false
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool get yinzhengjie nodelete
nodelete: false
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pool delete yinzhengjie yinzhengjie --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@ceph141 ~]#