前面已经部署好了monitor, 接下来要部署4台osd节点.
osd节点需要注意osd数据目录, 建议挂载到独立的硬盘, journal建议放到ssd中.
如果一个主机有多个硬盘并且没有使用lvm来管理条带的话, 建议启动多个osd daemon, 分别挂载到他们的数据目录.
生成osd UUID, 每个osd进程对应一个uuid(如果单台主机要运行多个osd daemon, 那么也需要多个uuid) :
从mon节点拷贝key文件和ceph.conf到osd节点, 没有KEY无法通讯.
其中一台osd, 可以看到拷贝过来的ceph.conf和key文件.
创建osd, 返回osd numberid.
初始化osd 数据目录
注册OSD认证key
添加Ceph Node到CRUSH map.
添加ceph node到default root.
添加osd到crush map.
启动osd服务, 使用CentOS的话, 需要添加sysvinit文件到 /var/lib/ceph/osd/{cluster-name}-{osd-num}/目录.
添加sysvinit文件, 启动服务.
这种方式支持指定数据目录, journal文件.
启动第二台osd后, ceph 集群正常了, HEALTH_OK.
查看osd tree.
4台osd添加结束, 会有一个告警 too few pgs per osd (16 < min 20) .
修改pool对应的pg_num和pgp_num.
同时还需要调整pgp_num
需要注意, pg_num只能增加, 不能缩小.
ceph osd数据目录的结构如下(包含journal, 数据等) :
Once you have your initial monitor(s) running, you should add OSDs. Your cluster cannot reach an active + clean state until you have enough OSDs to handle the number of copies of an object (e.g., osd pool default size = 2 requires at least two OSDs). After bootstrapping your monitor, your cluster has a default CRUSH map; however, the CRUSH map doesn’t have any Ceph OSD Daemons mapped to a Ceph Node.
生成osd UUID, 每个osd进程对应一个uuid(如果单台主机要运行多个osd daemon, 那么也需要多个uuid) :
[root@osd1 ~]# uuidgen
854777b2-c188-4509-9df4-02f57bd17e12
[root@osd2 ceph]# uuidgen
d13959a4-a3fc-4589-91cd-7104fcd8dbe9
[root@osd3 ceph]# uuidgen
d28bdb68-e8ee-4ca8-be5d-7e86438e7663
[root@osd4 ~]# uuidgen
c2e86146-fedc-4486-8d2f-ead6f473e841
从mon节点拷贝key文件和ceph.conf到osd节点, 没有KEY无法通讯.
[root@mon1 ceph]# cd /etc/ceph
[root@mon1 ceph]# ll
total 8
-rw------- 1 root root 137 Nov 28 16:39 ceph.client.admin.keyring
-rw-r--r-- 1 root root 529 Dec 1 17:18 ceph.conf
[root@mon1 ceph]# scp ceph.conf ceph.client.admin.keyring 172.17.0.5:/etc/ceph
[root@mon1 ceph]# scp ceph.conf ceph.client.admin.keyring 172.17.0.6:/etc/ceph
[root@mon1 ceph]# scp ceph.conf ceph.client.admin.keyring 172.17.0.7:/etc/ceph
[root@mon1 ceph]# scp ceph.conf ceph.client.admin.keyring 172.17.0.8:/etc/ceph
其中一台osd, 可以看到拷贝过来的ceph.conf和key文件.
[root@osd2 ceph]# cd /etc/ceph
[root@osd2 ceph]# ll
total 12
-rw------- 1 root root 137 Dec 1 17:35 ceph.client.admin.keyring
-rw-r--r-- 1 root root 529 Dec 1 17:35 ceph.conf
-rwxr-xr-x 1 root root 92 Oct 29 18:35 rbdmap
创建osd, 返回osd numberid.
[root@osd1 ceph]# ceph osd create 854777b2-c188-4509-9df4-02f57bd17e12
0
返回0表示该osd-number=0
[root@osd2 ceph]# ceph osd create d13959a4-a3fc-4589-91cd-7104fcd8dbe9
1
返回0表示该osd-number=1
[root@osd3 ~]# ceph osd create d28bdb68-e8ee-4ca8-be5d-7e86438e7663
2
[root@osd4 ~]# ceph osd create c2e86146-fedc-4486-8d2f-ead6f473e841
3
创建osd数据目录, 注意替换osd-number :
使用外部目录, 然后使用软链接.
Create the default directory on your new OSD.
ssh {new-osd-host}
sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
[root@osd1 ceph]# mkdir /data01/ceph/osd/ceph-0
[root@osd2 ceph]# mkdir /data01/ceph/osd/ceph-1
[root@osd3 ceph]# mkdir /data01/ceph/osd/ceph-2
[root@osd4 ceph]# mkdir /data01/ceph/osd/ceph-3
[root@osd1 ceph]# ln -s /data01/ceph/osd/ceph-0 /var/lib/ceph/osd/
[root@osd2 ceph]# ln -s /data01/ceph/osd/ceph-1 /var/lib/ceph/osd/
[root@osd3 ceph]# ln -s /data01/ceph/osd/ceph-2 /var/lib/ceph/osd/
[root@osd4 ceph]# ln -s /data01/ceph/osd/ceph-3 /var/lib/ceph/osd/
性能相关 :
如果OSD数据目录要挂载其他块设备, osd number即ceph osd create返回的值.
最好在初始化数据目录前完成挂载, 因为可能要读取文件系统信息.
If the OSD is for a drive other than the OS drive, prepare it for use with Ceph, and mount it to the directory you just created:
ssh {new-osd-host}
sudo mkfs -t {fstype} /dev/{hdd}
sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
初始化osd 数据目录
Initialize the OSD data directory.
ssh {new-osd-host}
sudo ceph-osd -i {osd-num} --mkfs --mkkey --osd-uuid [{uuid}] --cluster {cluster_name}
[root@osd1 ceph]# ceph-osd -i 0 --mkfs --mkjournal --mkkey --osd-uuid 854777b2-c188-4509-9df4-02f57bd17e12 --cluster ceph --osd-data=/data01/ceph/osd/ceph-0 --osd-journal=/data01/ceph/osd/ceph-0/journal
2014-12-02 08:52:13.438014 7f909cad9880 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2014-12-02 08:52:13.669785 7f909cad9880 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2014-12-02 08:52:13.671436 7f909cad9880 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-12-02 08:52:13.942519 7f909cad9880 -1 created object store /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal for osd.0 fsid f649b128-963c-4802-ae17-5a76f36c4c76
2014-12-02 08:52:13.942645 7f909cad9880 -1 auth: error reading file: /var/lib/ceph/osd/ceph-0/keyring: can't open /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2014-12-02 08:52:13.942748 7f909cad9880 -1 created new key in keyring /var/lib/ceph/osd/ceph-0/keyring
[root@osd2 ceph]# ceph-osd -i 1 --mkfs --mkjournal --mkkey --osd-uuid d13959a4-a3fc-4589-91cd-7104fcd8dbe9 --cluster ceph --osd-data=/data01/ceph/osd/ceph-1 --osd-journal=/data01/ceph/osd/ceph-1/journal
[root@osd3 ceph]# ceph-osd -i 2 --mkfs --mkjournal --mkkey --osd-uuid d28bdb68-e8ee-4ca8-be5d-7e86438e7663 --cluster ceph --osd-data=/data01/ceph/osd/ceph-2 --osd-journal=/data01/ceph/osd/ceph-2/journal
[root@osd4 ceph]# ceph-osd -i 3 --mkfs --mkjournal --mkkey --osd-uuid c2e86146-fedc-4486-8d2f-ead6f473e841 --cluster ceph --osd-data=/data01/ceph/osd/ceph-3 --osd-journal=/data01/ceph/osd/ceph-3/journal
注册OSD认证key
Register the OSD authentication key. The value of ceph for ceph-{osd-num} in the path is the $cluster-$id. If your cluster name differs from ceph, use your cluster name instead.:
sudo ceph auth add osd.{osd-num} osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
[root@osd1 ~]# ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i /data01/ceph/osd/ceph-0/keyring
added key for osd.0
[root@osd2 ~]# ceph auth add osd.1 osd 'allow *' mon 'allow profile osd' -i /data01/ceph/osd/ceph-1/keyring
added key for osd.1
[root@osd3 ~]# ceph auth add osd.2 osd 'allow *' mon 'allow profile osd' -i /data01/ceph/osd/ceph-2/keyring
added key for osd.2
[root@osd4 ~]# ceph auth add osd.3 osd 'allow *' mon 'allow profile osd' -i /data01/ceph/osd/ceph-3/keyring
added key for osd.3
添加Ceph Node到CRUSH map.
Add your Ceph Node to the CRUSH map.
ceph osd crush add-bucket {hostname} host
For example:
[root@osd1 ~]# ceph osd crush add-bucket osd1 host
added bucket osd1 type host to crush map
[root@osd2 ~]# ceph osd crush add-bucket osd2 host
added bucket osd2 type host to crush map
[root@osd3 ~]# ceph osd crush add-bucket osd3 host
added bucket osd3 type host to crush map
[root@osd4 ~]# ceph osd crush add-bucket osd4 host
added bucket osd4 type host to crush map
添加ceph node到default root.
Place the Ceph Node under the root default.
[root@osd1 ~]# ceph osd crush move osd1 root=default
moved item id -2 name 'osd1' to location {root=default} in crush map
[root@osd2 ~]# ceph osd crush move osd2 root=default
moved item id -3 name 'osd2' to location {root=default} in crush map
[root@osd3 ~]# ceph osd crush move osd3 root=default
moved item id -4 name 'osd3' to location {root=default} in crush map
[root@osd4 ~]# ceph osd crush move osd4 root=default
moved item id -5 name 'osd4' to location {root=default} in crush map
添加osd到crush map.
Add the OSD to the CRUSH map so that it can begin receiving data. You may also decompile the CRUSH map, add the OSD to the device list, add the host as a bucket (if it's not already in the CRUSH map), add the device as an item in the host, assign it a weight, recompile it and set it.
ceph osd crush add osd.{osd_num}|{id-or-name} {weight} [{bucket-type}={bucket-name} ...]
For example:
[root@osd1 ~]# ceph osd crush add osd.0 1.0 host=osd1
add item id 0 name 'osd.0' weight 1 at location {host=osd1} to crush map
[root@osd2 ~]# ceph osd crush add osd.1 1.0 host=osd2
add item id 1 name 'osd.1' weight 1 at location {host=osd2} to crush map
[root@osd3 ~]# ceph osd crush add osd.2 1.0 host=osd3
[root@osd4 ~]# ceph osd crush add osd.3 1.0 host=osd4
启动osd服务, 使用CentOS的话, 需要添加sysvinit文件到 /var/lib/ceph/osd/{cluster-name}-{osd-num}/目录.
After you add an OSD to Ceph, the OSD is in your configuration. However, it is not yet running. The OSD is down and in. You must start your new OSD before it can begin receiving data.
For Debian/CentOS/RHEL, use sysvinit:
sudo touch /var/lib/ceph/osd/{cluster-name}-{osd-num}/sysvinit
sudo /etc/init.d/ceph start osd.1
添加sysvinit文件, 启动服务.
[root@osd1 ceph]# touch /var/lib/ceph/osd/ceph-0/sysvinit
[root@osd1 ceph]# service ceph start osd.0
=== osd.0 ===
create-or-move updated item name 'osd.0' weight 0.01 at location {host=osd1,root=default} to crush map
Starting Ceph osd.0 on osd1...
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
[root@osd2 ~]# touch /var/lib/ceph/osd/ceph-1/sysvinit
[root@osd2 ~]# service ceph start osd.1
=== osd.1 ===
create-or-move updated item name 'osd.1' weight 0.01 at location {host=osd2,root=default} to crush map
Starting Ceph osd.1 on osd2...
starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
[root@osd3 ~]# touch /var/lib/ceph/osd/ceph-2/sysvinit
[root@osd3 ~]# service ceph start osd.2
[root@osd4 ~]# touch /var/lib/ceph/osd/ceph-3/sysvinit
[root@osd4 ~]# service ceph start osd.3
部署osd结束.
或者你可以直接使用ceph-osd命令启动osd
例如
/usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph
这种方式支持指定数据目录, journal文件.
[root@osd2 ~]# ceph-osd -h
--conf/-c FILE read configuration from the given configuration file
--id/-i ID set ID portion of my name
--name/-n TYPE.ID set name
--cluster NAME set cluster name (default: ceph)
--version show version and quit
-d run in foreground, log to stderr.
-f run in foreground, log to usual location.
--debug_ms N set message debug level (e.g. 1)
2014-12-09 17:08:50.233420 7ff77aff6880 -1 usage: ceph-osd -i osdid [--osd-data=path] [--osd-journal=path] [--mkfs] [--mkjournal] [--convert-filestore]
2014-12-09 17:08:50.233423 7ff77aff6880 -1 --debug_osd N set debug level (e.g. 10)
启动第二台osd后, ceph 集群正常了, HEALTH_OK.
[root@osd2 ~]# ceph -s
cluster f649b128-963c-4802-ae17-5a76f36c4c76
health HEALTH_OK
monmap e1: 3 mons at {mon1=172.17.0.2:6789/0,mon2=172.17.0.3:6789/0,mon3=172.17.0.4:6789/0}, election epoch 12, quorum 0,1,2 mon1,mon2,mon3
osdmap e13: 2 osds: 2 up, 2 in
pgmap v24: 64 pgs, 1 pools, 0 bytes data, 0 objects
3213 MB used, 15632 MB / 19902 MB avail
64 active+clean
查看osd tree.
[root@osd2 ~]# ceph osd tree
# id weight type name up/down reweight
-1 2 root default
-2 1 host osd1
0 1 osd.0 up 1
-3 1 host osd2
1 1 osd.1 up 1
4台osd添加结束, 会有一个告警 too few pgs per osd (16 < min 20) .
[root@osd4 ~]# ceph -s
cluster f649b128-963c-4802-ae17-5a76f36c4c76
health HEALTH_WARN too few pgs per osd (16 < min 20)
monmap e1: 3 mons at {mon1=172.17.0.2:6789/0,mon2=172.17.0.3:6789/0,mon3=172.17.0.4:6789/0}, election epoch 12, quorum 0,1,2 mon1,mon2,mon3
osdmap e27: 4 osds: 4 up, 4 in
pgmap v59: 64 pgs, 1 pools, 0 bytes data, 0 objects
6432 MB used, 31260 MB / 39805 MB avail
64 active+clean
[root@osd4 ~]# ceph osd tree
# id weight type name up/down reweight
-1 4 root default
-2 1 host osd1
0 1 osd.0 up 1
-3 1 host osd2
1 1 osd.1 up 1
-4 1 host osd3
2 1 osd.2 up 1
-5 1 host osd4
3 1 osd.3 up 1
解决办法, 需要修改
pg_num
,
pgp_num
.
先要获取pool name, 如下, 返回pool name : rbd.
[root@mon1 ~]# ceph osd pool stats
pool rbd id 0
nothing is going on
修改pool对应的pg_num和pgp_num.
[root@mon1 ~]# ceph osd pool set rbd pg_num 128
[root@mon1 ~]# ceph -s
cluster f649b128-963c-4802-ae17-5a76f36c4c76
health HEALTH_WARN pool rbd pg_num 128 > pgp_num 64
monmap e1: 3 mons at {mon1=172.17.0.2:6789/0,mon2=172.17.0.3:6789/0,mon3=172.17.0.4:6789/0}, election epoch 12, quorum 0,1,2 mon1,mon2,mon3
osdmap e29: 4 osds: 4 up, 4 in
pgmap v117: 128 pgs, 1 pools, 0 bytes data, 0 objects
6434 MB used, 31258 MB / 39805 MB avail
128 active+clean
同时还需要调整pgp_num
[root@mon1 ~]# ceph osd pool set rbd pgp_num 128
set pool 0 pgp_num to 128
[root@mon1 ~]# ceph -s
cluster f649b128-963c-4802-ae17-5a76f36c4c76
health HEALTH_OK
monmap e1: 3 mons at {mon1=172.17.0.2:6789/0,mon2=172.17.0.3:6789/0,mon3=172.17.0.4:6789/0}, election epoch 12, quorum 0,1,2 mon1,mon2,mon3
osdmap e32: 4 osds: 4 up, 4 in
pgmap v130: 128 pgs, 1 pools, 0 bytes data, 0 objects
6434 MB used, 31258 MB / 39805 MB avail
128 active+clean
需要注意, pg_num只能增加, 不能缩小.
[root@mon1 ~]# ceph osd pool set rbd pg_num 64
Error EEXIST: specified pg_num 64 <= current 128
ceph osd数据目录的结构如下(包含journal, 数据等) :
[root@osd1 ceph-0]# pwd
/var/lib/ceph/osd/ceph-0
[root@osd1 ceph-0]# ll
total 1048616
-rw-r--r-- 1 root root 37 Dec 2 08:52 ceph_fsid
drwxr-xr-x 68 root root 4096 Dec 2 09:00 current
-rw-r--r-- 1 root root 37 Dec 2 08:52 fsid
-rw-r--r-- 1 root root 1073741824 Dec 2 09:00 journal
-rw------- 1 root root 56 Dec 2 08:52 keyring
-rw-r--r-- 1 root root 21 Dec 2 08:52 magic
-rw-r--r-- 1 root root 6 Dec 2 08:52 ready
-rw-r--r-- 1 root root 4 Dec 2 08:52 store_version
-rw-r--r-- 1 root root 53 Dec 2 08:52 superblock
-rw-r--r-- 1 root root 0 Dec 2 08:59 sysvinit
-rw-r--r-- 1 root root 2 Dec 2 08:52 whoami
[参考]