pacemaker是高可用集群中的CRM(Cluster Resource Manager)资源管理层,他是一个服务,可以作为一个单独的服务启动,不过在如果使用corosync1.4中,我们可以设置corosync来启动pacemaker。pacemaker的配置接口可以在任意节点上安装crmsh或者pcs还有一些GUI界面的软件来完成。crmsh在Redhat6.4以后不是官方自带的,官方默认使用的pcs,crm是OpenSUSE的开源项目,所以如果需要还是用crm的话还需要去OpenSUSE源中下载。
CRM下载源
crmsh依赖于pssh,pssh在epel源中也有下载,这里给出crmsh的下载源:
centos7的源:http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo
[network_ha-clustering_Stable]
name=Stable High Availability/Clustering packages (CentOS_CentOS-7)
type=rpm-md
baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/
gpgcheck=1
gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7//repodata/repomd.xml.key
enabled=1
- 1
- 2
- 3
- 4
- 5
- 6
- 7
CRM命令帮助
在pacemaker1.0之后,pacemaker有了一个继承的校本化的集群控制shell,他把麻烦的XML配置隐藏起来,允许你一次做出许多修改并且自动提交(检测是否合法)。
监控集群状态的主要命了是crm_mon(这个和crm status是一样的效果)。它可以运行在很多模式下并且有许多输出选项。如果要查看pacemaker相应的工具,可以通过–help或者man pages来查看。这些输出都是靠命令来生成的,所以他总是会在各个节点和工具之间同步。此外,pacemaker的版本和至此的stack可以通过–version选项看到。
本次试验中所安装的pacemaker如下:
[root@node-1 ~]# crm_mon --version
Pacemaker 1.1.15-11.el7_3.4
Written by Andrew Beekho
crm_mon --help
crm_mon - Provides a summary of cluster's current state.
Outputs varying levels of detail in a number of different formats.
Usage: crm_mon mode [options]
Options:
-?, --help This text
-$, --version Version information
-V, --verbose Increase debug output
-Q, --quiet Display only essential output
Modes:
-h, --as-html=value Write cluster status to the named html file
-X, --as-xml Write cluster status as xml to stdout. This will enable one-shot mode.
-w, --web-cgi Web mode with output suitable for cgi
-s, --simple-status Display the cluster status once as a simple one line output (suitable for nagios)
Display Options:
-n, --group-by-node Group resources by node
-r, --inactive Display inactive resources
-f, --failcounts Display resource fail counts
-o, --operations Display resource operation history
-t, --timing-details Display resource operation history with timing details
-c, --tickets Display cluster tickets
-W, --watch-fencing Listen for fencing events. For use with --external-agent, --mail-to and/or --snmp-traps where supported
-L, --neg-locations[=value] Display negative location constraints [optionally filtered by id prefix]
-A, --show-node-attributes Display node attributes
-D, --hide-headers Hide all headers
-R, --show-detail Show more details (node IDs, individual clone instances)
-b, --brief Brief output
-j, --pending Display pending state if 'record-pending' is enabled
Additional Options:
-i, --interval=value Update frequency in seconds
-1, --one-shot Display the cluster status once on the console and exit
-N, --disable-ncurses Disable the use of ncurses
-d, --daemonize Run in the background as a daemon
-p, --pid-file=value (Advanced) Daemon pid file location
-E, --external-agent=value A program to run when resource operations take place.
-e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone).
Examples:
Display the cluster status on the console with updates as they occur:
# crm_mon
Display the cluster status on the console just once then exit:
# crm_mon -1
Display your cluster status, group resources by node, and include inactive resources in the list:
# crm_mon --group-by-node --inactive
Start crm_mon as a background daemon and have it write the cluster status to an HTML file:
# crm_mon --daemonize --as-html /path/to/docroot/filename.html
Start crm_mon and export the current cluster status as xml to stdout, then exit.:
# crm_mon --as-xml
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
如果SNMP或者email的选项没有出现在选项中,说明pacemaker编译的时候没有打开对他们的支持,你需要联系提供这个发行版的人,或者自己编译。
CRM命令详解
crm有两种工作方式
1,批处理模式,就是在shell命令行中直接输入命令
2,交互式模式(crm(live)#)进入到crmsh中交互执行
二,命令详解
ps:这里笔者找到了网上允许转载的详细博文中整合了个人觉得算是解释的比较全面的命令详解,有兴趣的朋友不妨拷贝下来放到自己的记事本里以备不时之需。
一级子命令
[root@node-1 corosync]# crm
crm(live)# help
This is crm shell, a Pacemaker command line interface.
Available commands:
cib manage shadow CIBs //cib沙盒
resource resources management //所有的资源都在这个子命令后定义
configure CRM cluster configuration //编辑集群配置信息
node nodes management //集群节点管理子命令
options user preferences //用户优先级
history CRM cluster history //命令历史记录
site Geo-cluster support
ra resource agents information center //资源代理子命令(所有与资源代理相关的程都在此命令之下)
status show cluster status //显示当前集群的状态信息
help,? show help (help topics for list of topics)//查看当前区域可能的命令
end,cd,up go back one level //返回第一级crm(live)
quit,bye,exit exit the program //退出crm(live)交互模式
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
resource资源状态控制子命令
crm(live)resource# help
vailable commands:
status show status of resources //显示资源状态信息
start start a resource //启动一个资源
stop stop a resource //停止一个资源
restart restart a resource //重启一个资源
promote promote a master-slave resource //提升一个主从资源
demote demote a master-slave resource //降级一个主从资源
manage put a resource into managed mode //将一个资源加入到管理模式下
unmanage put a resource into unmanaged mode //从管理模式下去除一个资源
migrate migrate a resource to another node //将资源迁移到另一个节点上
unmigrate unmigrate a resource to another node
param manage a parameter of a resource //管理资源的参数
secret manage sensitive parameters //管理敏感参数
meta manage a meta attribute //管理源属性
utilization manage a utilization attribute
failcount manage failcounts //管理失效计数器
cleanup cleanup resource status //清理资源状态
refresh refresh CIB from the LRM status //从LRM(LRM本地资源管理)更新CIB(集群信息库),在
reprobe probe for resources not started by the CRM //探测在CRM中没有启动的资源
trace start RA tracing //启用资源代理(RA)追踪
untrace stop RA tracing //禁用资源代理(RA)追踪
help show help (help topics for list of topics) //显示帮助
end go back one level //返回一级(crm(live)#)
quit exit the program //退出交互式程序
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
configure资源定义子命令
crm(live)configure# help
Available commands:
node define a cluster node //定义一个集群节点
primitive define a resource //定义资源
monitor add monitor operation to a primitive //对一个资源添加监控选项(如超时时间,启动失败后的操作)
group define a group //定义一个组类型(将多个资源整合在一起)
clone define a clone //定义一个克隆类型(可以设置总的克隆数,每一个节点上可以运行几个克隆)
ms define a master-slave resource //定义一个主从类型(集群内的节点只能有一个运行主资源,其它从的做备用)
rsc_template define a resource template //定义一个资源模板
location a location preference //定义位置约束优先级(默认运行于那一个节点(如果位置约束的值相同,默认倾向性那一个高,就在那一个节点上运行))
colocation colocate resources //排列约束资源(多个资源在一起的可能性)
order order resources //资源的启动的先后顺序
rsc_ticket resources ticket dependency
property set a cluster property //设置集群属性
rsc_defaults set resource defaults //设置资源默认属性(粘性)
fencing_topology node fencing order //隔离节点顺序
role define role access rights //定义角色的访问权限
user define user access rights //定义用用户访问权限
op_defaults set resource operations defaults //设置资源默认选项
schema set or display current CIB RNG schema
show display CIB objects //显示集群信息库对
edit edit CIB objects //编辑集群信息库对象(vim模式下编辑)
filter filter CIB objects //过滤CIB对象
delete delete CIB objects //删除CIB对象
default-timeouts set timeouts for operations to minimums from the meta-data
rename rename a CIB object //重命名CIB对象
modgroup modify group //改变资源组
refresh refresh from CIB //重新读取CIB信息
erase erase the CIB //清除CIB信息
ptest show cluster actions if changes were committed
rsctest test resources as currently configured
cib CIB shadow management
cibstatus CIB status management and editing //cib状态管理和编辑
template edit and import a configuration from a template //编辑或导入配置模板
commit commit the changes to the CIB //将更改后的信息提交写入CIB
verify verify the CIB with crm_verify //CIB语法验证
upgrade upgrade the CIB to version 1.0 //升级CIB到1.0
save save the CIB to a file //将当前CIB导出到一个文件中(导出的文件存于切换crm 之前的目录)
load import the CIB from a file //从文件内容载入CIB
graph generate a directed graph
xml raw xml
help show help (help topics for list of topics) //显示帮助信息
end go back one level //回到第一级(crm(live)#)
quit exit the program //退出crm交互模式
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
node节点管理子命令
crm(live)# node
crm(live)node# help
Node management and status commands.
Available commands:
status show nodes status as XML //以xml格式显示节点状态信息
show show node //命令行格式显示节点状态信息
standby put node into standby //模拟指定节点离线(standby在后面必须的FQDN)
online set node online // 节点重新上线
maintenance put node into maintenance mode //将一个节点状态改为maintenance
ready put node into ready mode //将一个节点状态改为ready
fence fence node //隔离节点
clearstate Clear node state //清理节点状态信息
delete delete node //删除 一个节点
attribute manage attributes
utilization manage utilization attributes
status-attr manage status attributes
help show help (help topics for list of topics)
end go back one level //回到上一次
quit exit the program //退出
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
ra资源代理子命令
crm(live)# ra
crm(live)ra# help
Available commands:
classes list classes and providers //为资源代理分类
list list RA for a class (and provider)//显示一个类别中的提供的资源
meta show meta data for a RA //显示一个资源代理序的可用参数(如meta ocf:heartbeat:IPaddr2)
providers show providers for a RA and a class
help show help (help topics for list of topics)
end go back one level
quit exit the program
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
实例
系统资源查看
crm(live)ra# classes //系统所使用的资源代理
lsb
ocf / heartbeat pacemaker
service
stonith
- 1
- 2
- 3
- 4
- 5
默认系统资源列表
crm(live)ra# list ocf //列表默认系统资源
CTDB ClusterMon Delay Dummy Filesystem HealthCPU
IPaddr IPaddr2 IPsrcaddr HealthCPU HealthSMART HealthSMART
LVM MailTo Route SendArp Squid
Stateful SysInfo SystemHealth VirtualDomain Xinetd
apache conntrackd controld db2 dhcpd
ethmonitor exportfs iSCSILogicalUnit mysql named
nfsnotify nfsserver pgsql ping pingd
postfix remote rsyncd symlink tomcat
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
查看外加的系统资源
crm(live)ra# list lsb //列表外加的系统资源
NetworkManager abrt-ccpp abrt-oops abrtd acpid
atd auditd autofs blk-availability certmonger
corosync corosync-notifyd cpuspeed crond cups
dnsmasq firstboot haldaemon halt hsqldb
ip6tables iptables irqbalance jexec kdump
killall lvm2-lvmetad lvm2-monitor mdmonitor messagebus
mysqld netconsole netfs network nfs
nfslock nginx nmb ntpd ntpdate
oddjobd openfire pacemaker php-fpm portreserve
postfix psacct quota_nld rdisc redis
restorecond rngd rpcbind rpcgssd rpcidmapd
rpcsvcgssd rsyslog sandbox saslauthd single
slapd smartd smb snmpd snmptrapd
spice-vdagentd sshd sssd sysstat udev-post
vsftpd wdaemon winbind wpa_supplicant ypbind
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
查看系统资源的参数
crm(live)ra# meta ocf:heartbeat:IPaddr //查看系统资源IPaddr的参数
Manages virtual IPv4 and IPv6 addresses (Linux specific version) (ocf:heartbeat:IPaddr)
This Linux-specific resource manages IP alias IP addresses.
It can add an IP alias, or remove one.
In addition, it can implement Cluster Alias IP functionality
if invoked as a clone resource.
If used as a clone, you should explicitly set clone-node-max >= 2,
and/or clone-max < number of nodes. In case of node failure,
clone instances need to be re-allocated on surviving nodes.
This would not be possible if there is already an instance on those nodes,
and clone-node-max=1 (which is the default).
Parameters (*: required, []: default):
ip* (string): IPv4 or IPv6 address
The IPv4 (dotted quad notation) or IPv6 address (colon hexadecimal notation)
example IPv4 “192.168.1.1”.
example IPv6 “2001:db8:DC28:0:0:FC57:D4C8:1FFF”.
属性修改
crm(live)configure# property stonith-enabled=false //禁用stonith-enable
crm(live)configure# property no-quorum-policy=ignore //忽略投票规则
- 1
- 2
- 3
定义一资源所使用的命令
/**
* primitive:定义一资源所使用的命令
* webip:为资源起一个名字
* ocf:heartbeat;IPaddr:所使用资源代理的类别,由谁提供的那一个代理程序
* op monitor 对webip做监控
* interval:间隔时间
* timeout:超时时间
* on-fail:失败自起
*/
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.10.130 op monitor interval=30s timeout=20s on-fail=restart
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
crm(live)configure# primitive nginx_res lsb:nginx //那些在/etc/init.d/*的脚本就是属于lsb的
params后面的参数,可以通过meta命令来查找,不同的系统资源参数是不一样的。
定义排列约束
/**
* 定义排列约束
* colocation:排列约束命令
* nginx_web : 约束名
* inf:#(可能性,inf表示永久在一起,也可以是数值)
* webip nginx_res:#资源名称
*/
crm(live)configure# colocation nginx_web inf: nginx_res webip
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
定义资源启动顺序
/**
* 定义资源启动顺序
* order : 顺序约束的命令
* nginx_after_ip : 约束ID
* mandatory: #指定级别(此处有三种级别:mandatory:强制, Optional:可选,Serialize:序列化)
* webip nginx_res:#资源名称,这里书写的先后顺序相当重要
*/
crm(live)configure# order nginx_after_ip mandatory: webip nginx_res
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
定义节点权重
/**
* 定义节点权重
* location:节点权重命令
* webip_and_webnfs_and_webserver:约束名称
* webip 500: node1:webip这个资源的node1节点的权重是500
*/
crm(live)configure# location webip_and_webnfs_and_webserver webip 500: node1
- 1
- 2
- 3
- 4
- 5
- 6
- 7
定义默认资源属性
/**
* 定义默认资源属性
* rsc_defaults:默认资源
* 这样定义代表集群中每一个资源的默认权重,只有当资源服务不在当前节点时,权重才会生效,比如,这里我定义了三个资源webip、webnfs、webserver,对每一个资源的权重为100,那么加在一起就变成了300,之前己经定义node1的位置约束的值为500,当node1宕机后,重新上线,这样就切换到node1上了。
*/
crm(live)configure# rsc_defaults resource-stickiness=100
- 1
- 2
- 3
- 4
- 5
- 6
删除资源或者节点
crm(live)configure# delete nginx_res //删除一个资源,约束,排序,组等
crm(live)node# delete node1 //删除一个节
- 1
- 2
资源其他
crm(live)configure# verify //审核
crm(live)configure# commit //提交
crm(live)configure# show //显示
- 1
- 2
- 3
节点挂起和在线
crm(live)node# standby node1 //节点挂起
crm(live)node# online node1 //节点上线
- 1
- 2
- 3