----本文大纲
-
corosync、pacemaker各自是什么
-
常见高可用集群解决方案
-
安装corosync、pacemaker
-
pacemaker资源管理器(CRM)命令注解
-
实例演示
一、corosync、pacemaker各自是什么
corosync是用于高可用环境中的提供通讯服务的,它位于高可用集群架构中的底层(Message Layer),扮演着为各节点(node)之间提供心跳信息传递这样的一个角色;
pacemaker是一个开源的高可用资源管理器(CRM),位于HA集群架构中资源管理、资源代理(RA)这个层次,它不能提供底层心跳信息传递的功能,它要想与对方节点通信需要借助底层的心跳传递服务,将信息通告给对方。通常它与corosync的结合方式有两种:
-
pacemaker作为corosync的插件运行;
-
pacemaker作为独立的守护进程运行;
注:
由于corosync的早期版本不具备投票能力,所以集群内的节点总数应为奇数,并且大于2
在corosync1.0的时候,其本身不具备票务功能(votes),不过在corosync2.0之后引入了votequorum
cman(DC)+corosync(如果想用pacemaker又想用cman,只能把cman当成corosync的插件来用)
二、常见高可用集群解决方案
-
heartbeat+crm
-
cman+rgmanager
-
cman+pacemaker
-
corosync+pacemaker(pacemaker作为资源管理器)
三、安装corosync、pacemaker
1
|
#yum install -y corosync
|
其配置文件位于/etc/corosync/下,模板为corosync.conf.example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
# Please read the corosync.conf.5 manual page
compatibility: whitetank
#兼容08.以前的版本
totem {
version: 2
#totme 的版本
secauth: off
#安全认证是否打开,最好打开
threads: 0
#用于安全认证开启并行线程数
interface {
ringnumber: 0
#环号码,如果一个主机有多块网卡,避免心跳信息回流
bindnetaddr: 192.168.1.1
#网络地址(节点所在的网络地址)
mcastaddr: 226.94.1.1
#广播地址
mcastport: 5405
#多播占用的端口
ttl: 1
#只向外一跳心跳信息,避免组播报文环路
}
}
#totem定义集群内各节点间是如何通信的,totem本是一种协议,专用于corosync专用于各节点间的协议,协议是有版本的
logging {
fileline: off
to_stderr: no
#日志信息是否发往错误输出(否)
to_logfile:
yes
#是否记录日志文件
to_syslog:
yes
#是否记录于syslog日志-->此类日志记录于/var/log/message中
logfile:
/var/log/cluster/corosync
.log
#日志存放位置
debug: off
#只要不是为了排错,最好关闭debug,它记录的信息过于详细,会占用大量的磁盘IO.
timestamp: on
#记录日志的时间戳
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
|
如果想让pacemker在corosync以一个插件来用,就要在corosync.conf文件写如下内容
1
2
3
4
5
6
7
8
9
10
|
service {
ver:0
name:pacemaker
}
#corosync启动后会自动启动pacemaker
aisexec {
user :root
group:root
}
#启用ais功能时以什么身份来运行,默认为root,aisexec区域也可以不写
|
1
|
#scp -p authkey corosync.conf 192.168.1.111:/etc/corosync/
|
第二步、启动corosync
1
2
3
4
|
[root@essun corosync]
# ssh essun.node2.com 'service corosync start'
Starting Corosync Cluster Engine (corosync): [ OK ]
[root@essun corosync]
# service corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
|
查看日志信息,可以明显的看到corosync是否启动正常(在每一个节点上都要查看)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
|
#tail -40 /var/log/cluster/corosync.log
Apr 25 23:12:01 [2811] essun.node3.com crmd: info: update_attrd: Connecting to attrd... 5 retries remaining
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_replace: Digest matched on replace from essun.node2.com: cb225a22df77f4f0bfbf7bd73c7d4160
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_replace: Replaced 0.4.1 with 0.4.1 from essun.node2.com
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_replace operation
for
section
'all'
: OK (rc=0, origin=essun.node2.com
/crmd/24
, version=0.4.1)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Forwarding cib_delete operation
for
section
//node_state
[@
uname
=
'essun.node3.com'
]
/transient_attributes
to master (origin=
local
/crmd/9
)
Apr 25 23:12:01 [2811] essun.node3.com crmd: info: do_log: FSA: Input I_NOT_DC from do_cl_join_finalize_respond() received
in
state S_PENDING
Apr 25 23:12:01 [2811] essun.node3.com crmd: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Apr 25 23:12:01 [2809] essun.node3.com attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: write_cib_contents: Wrote version 0.3.0 of the CIB to disk (digest: 02ededba58f5938f53dd45f5bd06f577)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation
for
section nodes: OK (rc=0, origin=essun.node2.com
/crmd/26
, version=0.5.1)
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: info: cib_process_diff: Diff 0.4.1 -> 0.5.1 from
local
not applied to 0.3.1: current
"epoch"
is
less
than required
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update
diff
failed, requesting a full refresh (-207)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation
for
section status: OK (rc=0, origin=essun.node2.com
/crmd/29
, version=0.5.2)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation
for
section status: OK (rc=0, origin=essun.node2.com
/crmd/31
, version=0.5.3)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation
for
section
'all'
: OK (rc=0, origin=
local
/crmd/4
, version=0.5.3)
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: info: cib_process_diff: Diff 0.5.1 -> 0.5.2 from
local
not applied to 0.5.3: current
"num_updates"
is greater than required
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update
diff
failed (-206)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation
for
section
'all'
: OK (rc=0, origin=
local
/crmd/5
, version=0.5.3)
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: info: cib_process_diff: Diff 0.5.2 -> 0.5.3 from
local
not applied to 0.5.3: current
"num_updates"
is greater than required
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update
diff
failed (-206)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation
for
section
'all'
: OK (rc=0, origin=
local
/crmd/6
, version=0.5.3)
Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation
for
section cib: OK (rc=0, origin=essun.node2.com
/crmd/34
, version=0.5.4)
Apr 25 23:12:02 [2809] essun.node3.com attrd: notice: attrd_trigger_update: Sending flush
op
to all hosts
for
: probe_complete (
true
)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation
for
section
//cib/status//node_state
[@
id
=
'essun.node3.com'
]
//transient_attributes//nvpair
[@name=
'probe_complete'
]: No such device or address (rc=-6, origin=
local
/attrd/2
, version=0.5.4)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation
for
section
/cib
: OK (rc=0, origin=
local
/attrd/3
, version=0.5.4)
Apr 25 23:12:02 [2809] essun.node3.com attrd: notice: attrd_perform_update: Sent update 4: probe_complete=
true
Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation
for
section status: OK (rc=0, origin=essun.node2.com
/attrd/4
, version=0.5.5)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Forwarding cib_modify operation
for
section status to master (origin=
local
/attrd/4
)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation
for
section
//cib/status//node_state
[@
id
=
'essun.node3.com'
]
//transient_attributes//nvpair
[@name=
'probe_complete'
]: No such device or address (rc=-6, origin=
local
/attrd/5
, version=0.5.5)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation
for
section
/cib
: OK (rc=0, origin=
local
/attrd/6
, version=0.5.5)
Apr 25 23:12:02 [2809] essun.node3.com attrd: notice: attrd_perform_update: Sent update 7: probe_complete=
true
Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Forwarding cib_modify operation
for
section status to master (origin=
local
/attrd/7
)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: retrieveCib: Reading cluster configuration from:
/var/lib/pacemaker/cib/cib
.dnz3rc (digest:
/var/lib/pacemaker/cib/cib
.dOgpug)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation
for
section status: OK (rc=0, origin=essun.node2.com
/attrd/4
, version=0.5.6)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: write_cib_contents: Archived previous version as
/var/lib/pacemaker/cib/cib-2
.raw
Apr 25 23:12:02 [2806] essun.node3.com cib: info: write_cib_contents: Wrote version 0.5.0 of the CIB to disk (digest: 420e9390e2cb813eebbdf3bb73416dd2)
Apr 25 23:12:02 [2806] essun.node3.com cib: info: retrieveCib: Reading cluster configuration from:
/var/lib/pacemaker/cib/cib
.kgClFd (digest:
/var/lib/pacemaker/cib/cib
.gQtyTi)
Apr 25 23:12:14 [2806] essun.node3.com cib: info: crm_client_new: Connecting 0x1d8dc80
for
uid=0 gid=0 pid=2828
id
=2dfaa45a-28c4-4c7e-9613-603fb1217e12
Apr 25 23:12:14 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation
for
section
'all'
: OK (rc=0, origin=
local
/cibadmin/2
, version=0.5.6)
Apr 25 23:12:14 [2806] essun.node3.com cib: info: crm_client_destroy: Destroying 0 events
|
如果正常后,就可以使用crm status命令来查看当前集群节点信息了
1
2
3
4
5
6
7
8
9
|
[root@essun corosync]
# crm status
Last updated: Fri Apr 25 23:18:11 2014
Last change: Fri Apr 25 23:12:01 2014 via crmd on essun.node2.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ essun.node2.com essun.node3.com ]
|
当前有两个节点在线,node2和node3
四、pacemaker资源管理器(CRM)命令注解
进入到crmsh中交互执行
2、crm命令介绍
-
一级子命令
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
[root@essun corosync]
# crm
crm(live)
# help
This is crm shell, a Pacemaker
command
line interface.
Available commands:
cib manage shadow CIBs
#cib沙盒
resource resources management
#所有的资源都在这个子命令后定义
configure CRM cluster configuration
#编辑集群配置信息
node nodes management
#集群节点管理子命令
options user preferences
#用户优先级
history
CRM cluster
history
#
site Geo-cluster support
ra resource agents information center
#资源代理子命令(所有与资源代理相关的程都在此命令之下)
status show cluster status
#显示当前集群的状态信息
help,? show help (help topics
for
list of topics)
#查看当前区域可能的命令
end,
cd
,up go back one level
#返回第一级crm(live)#
quit,bye,
exit
exit
the program
#退出crm(live)交互模式
|
-
resource子命令
-
所有的资源状态都此处控制
-
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
crm(live)resource
# help
vailable commands:
status show status of resources
#显示资源状态信息
start start a resource
#启动一个资源
stop stop a resource
#停止一个资源
restart restart a resource
#重启一个资源
promote promote a master-slave resource
#提升一个主从资源
demote demote a master-slave resource
#降级一个主从资源
manage put a resource into managed mode
unmanage put a resource into unmanaged mode
migrate migrate a resource to another node
#将资源迁移到另一个节点上
unmigrate unmigrate a resource to another node
param manage a parameter of a resource
#管理资源的参数
secret manage sensitive parameters
#管理敏感参数
meta manage a meta attribute
#管理源属性
utilization manage a utilization attribute
failcount manage failcounts
#管理失效计数器
cleanup cleanup resource status
#清理资源状态
refresh refresh CIB from the LRM status
#从LRM(LRM本地资源管理)更新CIB(集群信息库),在
reprobe probe
for
resources not started by the CRM
#探测在CRM中没有启动的资源
trace start RA tracing
#启用资源代理(RA)追踪
untrace stop RA tracing
#禁用资源代理(RA)追踪
help show help (help topics
for
list of topics)
#显示帮助
end go back one level
#返回一级(crm(live)#)
quit
exit
the program
#退出交互式程序
|
-
configure子命令
-
所有资源的定义都是在此子命令下完成的
-
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
crm(live)configure
# help
Available commands:
node define a cluster node
#定义一个集群节点
primitive define a resource
#定义资源
monitor add monitor operation to a primitive
#对一个资源添加监控选项(如超时时间,启动失败后的操作)
group define a group
#定义一个组类型(将多个资源整合在一起)
clone define a clone
#定义一个克隆类型(可以设置总的克隆数,每一个节点上可以运行几个克隆)
ms define a master-slave resource
#定义一个主从类型(集群内的节点只能有一个运行主资源,其它从的做备用)
rsc_template define a resource template
#定义一个资源模板
location a location preference
#定义位置约束优先级(默认运行于那一个节点(如果位置约束的值相同,默认倾向性那一个高,就在那一个节点上运行))
colocation colocate resources
#排列约束资源(多个资源在一起的可能性)
order order resources
#资源的启动的先后顺序
rsc_ticket resources ticket dependency
property
set
a cluster property
#设置集群属性
rsc_defaults
set
resource defaults
#设置资源默认属性(粘性)
fencing_topology node fencing order
#隔离节点顺序
role define role access rights
#定义角色的访问权限
user define user access rights
#定义用用户访问权限
op_defaults
set
resource operations defaults
#设置资源默认选项
schema
set
or display current CIB RNG schema
show display CIB objects
#显示集群信息库对
edit edit CIB objects
#编辑集群信息库对象(vim模式下编辑)
filter filter CIB objects
#过滤CIB对象
delete delete CIB objects
#删除CIB对象
default-timeouts
set
timeouts
for
operations to minimums from the meta-data
rename rename a CIB object
#重命名CIB对象
modgroup modify group
#改变资源组
refresh refresh from CIB
#重新读取CIB信息
erase erase the CIB
#清除CIB信息
ptest show cluster actions
if
changes were committed
rsctest
test
resources as currently configured
cib CIB shadow management
cibstatus CIB status management and editing
template edit and
import
a configuration from a template
commit commit the changes to the CIB
#将更改后的信息提交写入CIB
verify verify the CIB with crm_verify
#CIB语法验证
upgrade upgrade the CIB to version 1.0
save save the CIB to a
file
#将当前CIB导出到一个文件中(导出的文件存于切换crm 之前的目录)
load
import
the CIB from a
file
#从文件内容载入CIB
graph generate a directed graph
xml raw xml
help show help (help topics
for
list of topics)
#显示帮助信息
end go back one level
#回到第一级(crm(live)#)
quit
exit
the program
#退出crm交互模式
|
-
node子命令
-
节点管理和状态命令
-
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
crm(live)resource
# cd ..
crm(live)
# node
crm(live)node
# help
Node management and status commands.
Available commands:
status show nodes status as XML
#以xml格式显示节点状态信息
show show node
#命令行格式显示节点状态信息
standby put node into standby
#模拟指定节点离线(standby在后面必须的FQDN)
online
set
node online
# 节点重新上线
maintenance put node into maintenance mode
ready put node into ready mode
fence fence node
#隔离节点
clearstate Clear node state
#清理节点状态信息
delete delete node
#删除 一个节点
attribute manage attributes
utilization manage utilization attributes
status-attr manage status attributes
help show help (help topics
for
list of topics)
end go back one level
quit
exit
the program
|
-
ra子命令
-
资源代理类别都在此处
-
1
2
3
4
5
6
7
8
9
10
11
|
crm(live)node
# cd ..
crm(live)
# ra
crm(live)ra
# help
Available commands:
classes list classes and providers
#为资源代理分类
list list RA
for
a class (and provider)
#显示一个类别中的提供的资源
meta show meta data
for
a RA
#显示一个资源代理序的可用参数(如meta ocf:heartbeat:IPaddr2)
providers show providers
for
a RA and a class
help show help (help topics
for
list of topics)
end go back one level
quit
exit
the program
|
注:
虽然这些命令所用的单词都很简单,但我还是将经常用得到的标注一下,虽然现在刚学完,记的比较清楚,但可能在以后的某一天对这里的某一个命令出现了盲区,岂不痛心疾首。(千万不要高估自己的记忆力,有时一个不小心就会骗了你!)
五、实例演示
注:
-
配置高可用的前提
-
时间同步
-
无密码登录
-
主机名解析
-
-
此处只为了演示命令的使用,并非生产环境配置
1、本机环境
系统:
centos 6.5 x86_64
节点:
essun.node2.com 192.168.1.111
essun.node3.com 192.168.1.108
各节点所需要的软件与资源
虚拟ip 一个 192.168.1.100
在两个节点上各安装上httpd服务,添加默认测试页,测试完成后禁止服务开机自动启动。
挂载nfs资源,提供nfs的主机为192.168.1.110
2、定义资源
-
禁用stonith-enable(如果不清楚有那些参数,可以使用按两下tab键对命令补全,使用cd ..可以反回到上一级命令)
1
2
3
4
5
6
7
8
9
10
11
12
13
|
crm(live)configure
# property stonith-enabled=false #(假装故障的节点已经安全的关机了, 不启用stonith进行裁决)
crm(live)configure
# verify #(此处没有信息就表示己经是正确操作)
crm(live)configure
# commit #(此时就可以正常提交了)
crm(live)configure
# show #(显示己经提交且正在生效的属性信息)
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
|
-
忽略投票规则
1
2
3
|
crm(live)configure
# property no-quorum-policy=ignore
crm(live)configure
# verify
crm(live)configure
# commit
|
-
定义一个虚拟ip
1
2
3
4
5
6
7
8
9
10
11
12
13
|
crm(live)configure
# primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.100
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
|
ip:参数名
-
定义一个文件系统挂载
先进入ra中查找文件系统所使用的资源代理
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
crm(live)configure ra
# classes
lsb
ocf / heartbeat pacemaker
service
stonith
crm(live)configure ra
# list ocf
CTDB ClusterMon Delay Dummy Filesystem HealthCPU
HealthSMART IPaddr IPaddr2 IPsrcaddr LVM MailTo
Route SendArp Squid Stateful SysInfo SystemHealth
VirtualDomain Xinetd apache conntrackd controld dhcpd
ethmonitor exportfs mysql named nfsserver pgsql
ping
pingd postfix remote rsyncd
symlink
crm(live)configure ra
# providers Filesystem
heartbeat
|
由此可知文件系统的资源代理是由ocf:heartbeat提供
查看此资源代理可的参数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
crm(live)configure ra
# meta ocf:heartbeat:Filesystem
Manages filesystem mounts (ocf:heartbeat:Filesystem)
Resource script
for
Filesystem. It manages a Filesystem on a
shared storage medium.
The standard monitor operation of depth 0 (also known as probe)
checks
if
the filesystem is mounted. If you want deeper tests,
set
OCF_CHECK_LEVEL to one of the following values:
10:
read
first 16 blocks of the device (raw
read
)
This doesn't exercise the filesystem at all, but the device on
which
the filesystem lives. This is noop
for
non-block devices
such as NFS, SMBFS, or bind mounts.
20:
test
if
a status
file
can be written and
read
The status
file
must be writable by root. This is not always the
case
with an NFS
mount
, as NFS exports usually have the
"root_squash"
option
set
. In such a setup, you must either use
read
-only monitoring (depth=10),
export
with
"no_root_squash"
on
your NFS server, or grant world write permissions on the
directory where the status
file
is to be placed.
Parameters (* denotes required, [] the default):
device* (string): block device
The name of block device
for
the filesystem, or -U, -L options
for
mount
, or NFS
mount
specificatio
n.
directory* (string):
mount
point
The
mount
point
for
the filesystem.
fstype* (string): filesystem
type
The
type
of filesystem to be mounted.
...........省略中.......
|
此处带有*表示必须参数,现在我们就可以定义了
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
crm(live)configure
# primitive webnfs ocf:heartbeat:Filesystem params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" op monitor interval=60s timeout=60s op start timeout=60s op stop timeout=60s
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
params device=
"192.168.1.110:/share"
directory=
"/var/www/html"
fstype=
"nfs"
\
op
monitor interval=
"60s"
timeout=
"60s"
\
op
start timeout=
"60s"
interval=
"0"
\
op
stop timeout=
"60s"
interval=
"0"
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
|
注解:
primitive #定义资源命令
webnfs #资源ID
ocf:heartbeat:Filesystem # 资源代理(RA)
params device="192.168.1.110:/share" #共享目录
directory="/var/www/html" #挂载目录
fstype="nfs" #文件类型
op monitor #对此webnfs做监控
interval=60s #间隔时间
timeout=60s #超时时间
op start timeout=60s #启动超时时间
op stop timeout=60s #停止超时时间
定义web服务资源
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
crm(live)configure
# primitive webserver lsb:httpd
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
params device=
"192.168.1.110:/share"
directory=
"/var/www/html"
fstype=
"nfs"
\
op
monitor interval=
"60s"
timeout=
"60s"
\
op
start timeout=
"60s"
interval=
"0"
\
op
stop timeout=
"60s"
interval=
"0"
primitive webserver lsb:httpd
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
|
将多个资源整全在一起(绑定在一起运行)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
crm(live)configure
# group webservice webip webnfs webserver
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
params device=
"192.168.1.110:/share"
directory=
"/var/www/html"
fstype=
"nfs"
\
op
monitor interval=
"60s"
timeout=
"60s"
\
op
start timeout=
"60s"
interval=
"0"
\
op
stop timeout=
"60s"
interval=
"0"
primitive webserver lsb:httpd
group webservice webip webnfs webserver
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
|
换个方式查看一下己生效的资源信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
crm(live)configure# cd ..
crm(live)# status
Last updated: Sat Apr
26
01
:
51
:
45
2014
Last change: Sat Apr
26
01
:
49
:
54
2014
via cibadmin on essun.node3.com
Stack: classic openais (
with
plugin)
Current DC: essun.node2.com - partition
with
quorum
Version:
1.1
.
10
-
14
.el6_5.
3
-368c726
2
Nodes configured,
2
expected votes
3
Resources configured
Online: [ essun.node2.com essun.node3.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started essun.node2.com
webnfs (ocf::heartbeat:Filesystem): Started essun.node2.com
webserver (lsb:httpd): Started essun.node2.com
|
上图表示所有的资源都在node2上,也就是192.168.1.111这个ip上,使用curl命令访问一下,看一下效果
1
2
3
4
5
6
7
8
9
10
11
|
[root@bogon share]
# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:0C:29:63:4A:25
inet addr:192.168.1.110 Bcast:255.255.255.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe63:4a25
/64
Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2747 errors:0 dropped:0 overruns:0 frame:0
TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:212090 (207.1 KiB) TX bytes:99626 (97.2 KiB)
[root@bogon share]
# curl http://192.168.1.111
来自于NFS文件系统
|
此时模拟node2节点故障,看资源会是否转移
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
crm(live)node
# standby essun.node2.com
crm(live)
# status
Last updated: Sat Apr 26 02:05:24 2014
Last change: Sat Apr 26 02:04:17 2014 via crm_attribute on essun.node3.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Node essun.node2.com: standby
Online: [ essun.node3.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started essun.node3.com
webnfs (ocf::heartbeat:Filesystem): Started essun.node3.com
webserver (lsb:httpd): Started essun.node3.com
|
再curl一次
1
2
3
4
5
6
|
[root@bogon share]
# curl http://192.168.1.111
curl: (7) couldn't connect to host
[root@bogon share]
# curl http://192.168.1.100
来自于NFS文件系统
[root@bogon share]
# curl http://192.168.1.108
来自于NFS文件系统
|
注解:
第一次curl表示httpd服务己经不再节点node2上运行了
第二次curl表示我使用vip还是可能访问得到挂载页面,表示服务没有因node2下线而终止
第三次curl表示使用node3ip同样也能访问到服务,可能判断服务运行于node3上。
这时,如果node2重新上线服务是不会切换到node2上的,如果想让node2上线后可以切换回来可以使用位置约束来指定其权重
下面使用第二种方式来限定资源,先将组定义删除,可以在crm configure #edit 编辑cib文件,将组定义的条目删除即可
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
crm(live)node
# online essun.node2.com
crm(live)
# status
Last updated: Sat Apr 26 02:20:13 2014
Last change: Sat Apr 26 02:19:29 2014 via crm_attribute on essun.node2.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ essun.node2.com essun.node3.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started essun.node3.com
webnfs (ocf::heartbeat:Filesystem): Started essun.node3.com
webserver (lsb:httpd): Started essun.node3.com
|
服务果然没有回来,看我咋把它收回来的a_c!
第一步,删除组限定,最好的办法使用edit命令,同样也可使用命令
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
crm(live)resource
# stop webservice #组别名
crm(live)configure
# delete webservice #删除组别
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com \
attributes standby=
"off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
params device=
"192.168.1.110:/share"
directory=
"/var/www/html"
fstype=
"nfs"
\
op
monitor interval=
"60s"
timeout=
"60s"
\
op
start timeout=
"60s"
interval=
"0"
\
op
stop timeout=
"60s"
interval=
"0"
primitive webserver lsb:httpd
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
\
no-quorum-policy=
"ignore"
\
last-lrm-refresh=
"1398450597"
|
这时己经没有组别定义了,这样就可以进行我的“计划”了
定义排列约束(在一起的可能性)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
crm(live)configure
# colocation webserver-with-webnfs-webip inf: webip webnfs webserver
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com \
attributes standby=
"off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
params device=
"192.168.1.110:/share"
directory=
"/var/www/html"
fstype=
"nfs"
\
op
monitor interval=
"60s"
timeout=
"60s"
\
op
start timeout=
"60s"
interval=
"0"
\
op
stop timeout=
"60s"
interval=
"0"
primitive webserver lsb:httpd
colocation webserver-with-webnfs-webip inf: webip webnfs webserver
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
\
no-quorum-policy=
"ignore"
\
last-lrm-refresh=
"1398450597"
|
注解:
colocation:排列约束命令
webserver-with-webnfs-webip: #约束名(ID)
inf:#(可能性,inf表示永久在一起,也可以是数值)
webip webnfs webserver:#资源名称
定义资源启动顺序
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
crm(live)configure
# order ip_before_webnfs_before_webserver mandatory: webip webnfs webserver
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com \
attributes standby=
"off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
params device=
"192.168.1.110:/share"
directory=
"/var/www/html"
fstype=
"nfs"
\
op
monitor interval=
"60s"
timeout=
"60s"
\
op
start timeout=
"60s"
interval=
"0"
\
op
stop timeout=
"60s"
interval=
"0"
primitive webserver lsb:httpd
colocation webserver-with-webnfs-webip inf: webip webnfs webserver
order ip_before_webnfs_before_webserver inf: webip webnfs webserver
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
\
no-quorum-policy=
"ignore"
\
last-lrm-refresh=
"1398450597"
|
注解:
order :顺序约束的命令
ip_before_webnfs_before_webserver #约束ID
mandatory: #指定级别(此处有三种级别:mandatory:强制, Optional:可选,Serialize:序列化)
webip webnfs webserver #资源名,这里书写的先后顺序相当重要
定义位置约束
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
crm(live)configure
# location webip_and_webnfs_and_webserver webip 500: essun.node2.com
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com \
attributes standby=
"off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
params device=
"192.168.1.110:/share"
directory=
"/var/www/html"
fstype=
"nfs"
\
op
monitor interval=
"60s"
timeout=
"60s"
\
op
start timeout=
"60s"
interval=
"0"
\
op
stop timeout=
"60s"
interval=
"0"
primitive webserver lsb:httpd
location webip_and_webnfs_and_webserver webip 500: essun.node2.com
colocation webserver-with-webnfs-webip inf: webip webnfs webserver
order ip_before_webnfs_before_webserver inf: webip webnfs webserver
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
\
no-quorum-policy=
"ignore"
\
last-lrm-refresh=
"1398450597"
|
注解:
定义默认资源属性
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
crm(live)configure
# rsc_defaults resource-stickiness=100
crm(live)configure
# verify
crm(live)configure
# commit
crm(live)configure
# show
node essun.node2.com \
attributes standby=
"off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
params ip=
"192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
params device=
"192.168.1.110:/share"
directory=
"/var/www/html"
fstype=
"nfs"
\
op
monitor interval=
"60s"
timeout=
"60s"
\
op
start timeout=
"60s"
interval=
"0"
\
op
stop timeout=
"60s"
interval=
"0"
primitive webserver lsb:httpd
location webip_and_webnfs_and_webserver webip 500: essun.node2.com
colocation webserver-with-webnfs-webip inf: webip webnfs webserver
order ip_before_webnfs_before_webserver inf: webip webnfs webserver
property $
id
=
"cib-bootstrap-options"
\
dc
-version=
"1.1.10-14.el6_5.3-368c726"
\
cluster-infrastructure=
"classic openais (with plugin)"
\
expected-quorum-votes=
"2"
\
stonith-enabled=
"false"
\
no-quorum-policy=
"ignore"
\
last-lrm-refresh=
"1398450597"
rsc_defaults $
id
=
"rsc-options"
\
resource-stickiness=
"100"
|
注解:
这样定义代表集群中每一个资源的默认粘性,只有当资源服务不在当前节点时,粘性才会生效,比如,这里我定义了三个资源webip、webnfs、webserver,对每一个资源的粘性为100,那么加在一起就变成了300,之前己经定义node2的位置约束的值为500,当node2宕机后,重新上线,这样就切换到node2上了。
最后看一下状态,资源都运行于node2上,将node2故障
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
crm(live)
# status
Last updated: Sat Apr 26 03:14:30 2014
Last change: Sat Apr 26 03:14:19 2014 via cibadmin on essun.node3.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ essun.node2.com essun.node3.com ]
webip (ocf::heartbeat:IPaddr): Started essun.node2.com
webnfs (ocf::heartbeat:Filesystem): Started essun.node2.com
webserver (lsb:httpd): Started essun.node2.com
crm(live)
# node
crm(live)node
# standby essun.node2.com
|
资源己在node3上运行了
1
2
3
4
5
6
7
8
9
10
11
12
13
|
crm(live)
# status
Last updated: Sat Apr 26 03:18:17 2014
Last change: Sat Apr 26 03:15:20 2014 via crm_attribute on essun.node3.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Node essun.node2.com: standby
Online: [ essun.node3.com ]
webip (ocf::heartbeat:IPaddr): Started essun.node3.com
webnfs (ocf::heartbeat:Filesystem): Started essun.node3.com
webserver (lsb:httpd): Started essun.node3.com
|
再curl两次
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
[root@bogon share]
# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:0C:29:63:4A:25
inet addr:192.168.1.110 Bcast:255.255.255.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe63:4a25
/64
Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2747 errors:0 dropped:0 overruns:0 frame:0
TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:212090 (207.1 KiB) TX bytes:99626 (97.2 KiB)
[root@bogon share]
# curl http://192.168.1.100
来自于NFS文件系统
[root@bogon share]
# curl http://192.168.1.108
来自于NFS文件系统
[root@bogon share]
#
|
将node2重新上线看资源是否能回来
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
crm(live)node
# online essun.node2.com
crm(live)node
# cd ..
crm(live)
# status
Last updated: Sat Apr 26 03:21:46 2014
Last change: Sat Apr 26 03:21:36 2014 via crm_attribute on essun.node3.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ essun.node2.com essun.node3.com ]
webip (ocf::heartbeat:IPaddr): Started essun.node2.com
webnfs (ocf::heartbeat:Filesystem): Started essun.node2.com
webserver (lsb:httpd): Started essun.node2.com
|
再curl三次
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
[root@bogon share]
# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:0C:29:63:4A:25
inet addr:192.168.1.110 Bcast:255.255.255.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe63:4a25
/64
Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2747 errors:0 dropped:0 overruns:0 frame:0
TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:212090 (207.1 KiB) TX bytes:99626 (97.2 KiB)
[root@bogon share]
# curl http://192.168.1.100
来自于NFS文件系统
[root@bogon share]
# curl http://192.168.1.108
curl: (7) couldn't connect to host
[root@bogon share]
# curl http://192.168.1.111
来自于NFS文件系统
[root@bogon share]
#
|
注解:
1.100是虚拟的集群IP
1.108为essun.node3.com
1.111为essun.node2.com
事实证明,资源还是夺回来了
=======================到此corosync+pacemaker的crmsh常用指令介绍完毕===========
PS:
英文不好,可能注释的不够准确,各们看官请多多海涵a_c~~~~~~