Pacemaker中CIB有一个由admin_epoch, epoch, num_updates组合而成的版本,当有节点加入集群时,根据版本号的大小,取其中版本最大的作为整个集群的统一配置。
admin_epoch, epoch, num_updates这3者中,admin_epoch通常是不会变的,epoch在每次"配置"变更时累加并把num_updates置0,num_updates在每次"状态"变更时累加。"配置"指持久的CIB中configuration节点下的内容,包括cluster属性,node的forever属性,资源属性等。"状态"指node的reboot属性,node死活,资源是否启动等动态的东西。
"状态"通常是可以通过monitor重新获取的(除非RA脚本设计的有问题),但"配置"出错可能会导致集群的故障,所以我们更需要关心epoch的变更以及节点加入后对集群配置的影响。尤其一些支持主从架构的RA脚本会动态修改配置(比如mysql的mysql_REPL_INFO
和pgsql的pgsql-data-status),一旦配置处于不一致状态可能会导致集群故障。
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html#idm140225199219024
3.2. Configuration Version
OS: CentOS 6.3
Pacemaker: 1.1.14-1.el6 (Build: 70404b0)
Corosync: 1.4.1-7.el6
admin_epoch, epoch, num_updates这3者中,admin_epoch通常是不会变的,epoch在每次"配置"变更时累加并把num_updates置0,num_updates在每次"状态"变更时累加。"配置"指持久的CIB中configuration节点下的内容,包括cluster属性,node的forever属性,资源属性等。"状态"指node的reboot属性,node死活,资源是否启动等动态的东西。
"状态"通常是可以通过monitor重新获取的(除非RA脚本设计的有问题),但"配置"出错可能会导致集群的故障,所以我们更需要关心epoch的变更以及节点加入后对集群配置的影响。尤其一些支持主从架构的RA脚本会动态修改配置(比如mysql的mysql_REPL_INFO
和pgsql的pgsql-data-status),一旦配置处于不一致状态可能会导致集群故障。
1. 手册说明
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html#idm140225199219024
3.2. Configuration Version
When a node joins the cluster, the cluster will perform a check to see who has the best configuration based on the fields below. It then asks the node with the highest (admin_epoch, epoch, num_updates) tuple to replace the configuration on all the nodes - which makes setting them, and setting them correctly, very important.
Table 3.1. Configuration Version Properties
Field | Description |
---|---|
admin_epoch
|
Never set this value to zero, in such cases the cluster cannot tell the difference between your configuration and the "empty" one used when nothing is found on disk.
|
epoch
|
|
num_updates
|
2.实际验证
2.1 环境
3台机器,srdsdevapp69,srdsdevapp71和srdsdevapp73OS: CentOS 6.3
Pacemaker: 1.1.14-1.el6 (Build: 70404b0)
Corosync: 1.4.1-7.el6
2.2 基本验证
0. 初始时epoch="48304",num_updates="4"
1. 更新集群配置导致epoch加1并将num_updates清0
2. 更新值如果和现有值相同epoch不变
3. 更新生命周期为forever的节点属性也导致epoch加1
4. 更新生命周期为reboot的节点属性导致num_updates加1
分区1(srdsdevapp69) : 未取得QUORUM
分区2(srdsdevapp71,srdsdevapp73) : 取得QUORUM
2. 在srdsdevapp69上做2次配置更新,使其epoch增加2
3.在srdsdevapp71上做1次配置更新,使其epoch增加1
4.恢复网络再检查集群的配置
可以发现集群采用了srdsdevapp69分区的配置,因为它的版本更大,这时在srdsdevapp71,srdsdevapp73分区上所做的更新丢失了。
这个测试反映了一个问题:取得QUORUM的分区配置可能会被未取得QUORUM的分区配置覆盖。如果自己开发RA的话,这是一个需要注意的问题。
1. 人为造成DC(srdsdevapp73)和其它两个节点的网络隔离形成分区
srdsdevapp73上epoch没有变
但另一个分区(srdsdevapp69,srdsdevapp71)上的epoch加1了
恢复网络后集群采用了版本号更高的配置,DC仍然是分区前的DC(srdsdevapp73)
通过这个测试可以发现:
开发RA的注意点
- [root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
- cib epoch="48304" num_updates="4" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:22:56 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
1. 更新集群配置导致epoch加1并将num_updates清0
[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo1 -v "1"
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
cib epoch="48305" num_updates="0" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:24:15 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
cib epoch="48305" num_updates="0" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:24:15 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
2. 更新值如果和现有值相同epoch不变
[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo1 -v "1"
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
cib epoch="48305" num_updates="0" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:24:15 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
cib epoch="48305" num_updates="0" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:24:15 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
3. 更新生命周期为forever的节点属性也导致epoch加1
- [root@srdsdevapp69 mysql_ha]# crm_attribute -N `hostname` -l forever -n foo2 -v 2
- [root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
- cib epoch="48306" num_updates="0" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:31:18 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
4. 更新生命周期为reboot的节点属性导致num_updates加1
- [root@srdsdevapp69 mysql_ha]# crm_attribute -N `hostname` -l reboot -n foo3 -v 2
- [root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
- cib epoch="48306" num_updates="1" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:31:18 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
2.3 分区验证
1. 人为造成srdsdevapp69和其它两个节点的网络隔离形成分区,分区前的DC(Designated Controller)为srdsdevapp73
[root@srdsdevapp69 mysql_ha]# iptables -A INPUT -j DROP -s srdsdevapp71
[root@srdsdevapp69 mysql_ha]# iptables -A OUTPUT -j DROP -s srdsdevapp71
[root@srdsdevapp69 mysql_ha]# iptables -A INPUT -j DROP -s srdsdevapp73
[root@srdsdevapp69 mysql_ha]# iptables -A OUTPUT -j DROP -s srdsdevapp73
两个分区上的epoch都没有变,仍是48306,但srdsdevapp69将自己作为了自己分区的DC 。
[root@srdsdevapp69 mysql_ha]# iptables -A OUTPUT -j DROP -s srdsdevapp71
[root@srdsdevapp69 mysql_ha]# iptables -A INPUT -j DROP -s srdsdevapp73
[root@srdsdevapp69 mysql_ha]# iptables -A OUTPUT -j DROP -s srdsdevapp73
分区1(srdsdevapp69) : 未取得QUORUM
- [root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
- cib epoch="48306" num_updates="5" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:31:18 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="0" dc-uuid="srdsdevapp69" update-user="root">
分区2(srdsdevapp71,srdsdevapp73) : 取得QUORUM
- [root@srdsdevapp71 ~]# cibadmin -Q |grep epoch
- cib epoch="48306" num_updates="4" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:31:18 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
2. 在srdsdevapp69上做2次配置更新,使其epoch增加2
- [root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo4 -v "1"
- [root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo5 -v "1"
- [root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
- cib epoch="48308" num_updates="0" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:41:57 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp69" update-client="crm_attribute" have-quorum="0" dc-uuid="srdsdevapp69" update-user="root">
3.在srdsdevapp71上做1次配置更新,使其epoch增加1
- [root@srdsdevapp71 ~]# crm_attribute --type crm_config -s set1 --name foo6 -v "1"
- [root@srdsdevapp71 ~]# cibadmin -Q |grep epoch
- cib epoch="48307" num_updates="0" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:42:25 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp71" update-client="crm_attribute" have-quorum="1" dc-uuid="srdsdevapp73" update-user="root">
4.恢复网络再检查集群的配置
- [root@srdsdevapp69 mysql_ha]# iptables -F
- [root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
- cib epoch="48308" num_updates="12" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:45:12 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp71" update-client="crmd" have-quorum="1" dc-uuid="srdsdevapp73" update-user="hacluster">
- [root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo5 -q
- 1
- [root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo4 -q
- 1
- [root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo6 -q
- Error performing operation: No such device or address
这个测试反映了一个问题:取得QUORUM的分区配置可能会被未取得QUORUM的分区配置覆盖。如果自己开发RA的话,这是一个需要注意的问题。
2.4 分区验证2
前一个测试中,产生分区前的DC在取得QUORUM的分区中,现在再试一下产生分区前的DC在未取得QUORUM的分区中的场景。- [root@srdsdevapp73 ~]# iptables -A INPUT -j DROP -s srdsdevapp69
- [root@srdsdevapp73 ~]# iptables -A OUTPUT -j DROP -s srdsdevapp69
- [root@srdsdevapp73 ~]# iptables -A INPUT -j DROP -s srdsdevapp71
- [root@srdsdevapp73 ~]# iptables -A OUTPUT -j DROP -s srdsdevapp71
- [root@srdsdevapp73 ~]# cibadmin -Q |grep epoch
- cib epoch="48308" num_updates="17" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:45:12 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp71" update-client="crmd" have-quorum="0" dc-uuid="srdsdevapp73" update-user="hacluster">
但另一个分区(srdsdevapp69,srdsdevapp71)上的epoch加1了
- [root@srdsdevapp69 ~]# cibadmin -Q |grep epoch
- cib epoch="48309" num_updates="6" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:49:39 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp71" update-client="crmd" have-quorum="1" dc-uuid="srdsdevapp71" update-user="hacluster">
恢复网络后集群采用了版本号更高的配置,DC仍然是分区前的DC(srdsdevapp73)
- [root@srdsdevapp73 ~]# iptables -F
- [root@srdsdevapp73 ~]# cibadmin -Q |grep epoch
- cib epoch="48309" num_updates="16" admin_epoch="2" validate-with="pacemaker-1.2" cib-last-written="Thu Mar 31 18:56:58 2016" crm_feature_set="3.0.10" update-origin="srdsdevapp71" update-client="crmd" have-quorum="1" dc-uuid="srdsdevapp73" update-user="hacluster">
通过这个测试可以发现:
- DC协商会导致epoch加1
- 分区恢复后,Pacemaker倾向于使分区前的DC作为新的DC
3.总结
Pacemaker的行为特征- CIB配置变更会导致epoch加1
- DC协商会导致epoch加1
- 分区恢复后,Pacemaker采取版本号大的作为集群的配置
- 分区恢复后,Pacemaker倾向于使分区前的DC作为新的DC
开发RA的注意点
- 尽量避免动态修改集群配置
- 如果做不到第一点,尽量避免使用多个动态集群配置参数,比如可以把多个参数拼接成一个(mysql的mysql_REPL_INFO就是这么干的)
- 检查crm_attribute的出错并重试(pgsql就是这么干的)
- 失去quorum时的资源停止处理(demote,stop)中避免修改集群配置