corosync+pacemaker的crmsh常用指令介绍

本文涉及的产品
日志服务 SLS,月写入数据量 50GB 1个月
简介:

----本文大纲

  • corosync、pacemaker各自是什么

  • 常见高可用集群解决方案

  • 安装corosync、pacemaker

  • pacemaker资源管理器(CRM)命令注解

  • 实例演示

一、corosync、pacemaker各自是什么

corosync是用于高可用环境中的提供通讯服务的,它位于高可用集群架构中的底层(Message Layer),扮演着为各节点(node)之间提供心跳信息传递这样的一个角色;

pacemaker是一个开源的高可用资源管理器(CRM),位于HA集群架构中资源管理、资源代理(RA)这个层次,它不能提供底层心跳信息传递的功能,它要想与对方节点通信需要借助底层的心跳传递服务,将信息通告给对方。通常它与corosync的结合方式有两种:

  • pacemaker作为corosync的插件运行;

  • pacemaker作为独立的守护进程运行;

注:

由于corosync的早期版本不具备投票能力,所以集群内的节点总数应为奇数,并且大于2

在corosync1.0的时候,其本身不具备票务功能(votes),不过在corosync2.0之后引入了votequorum

cman(DC)+corosync(如果想用pacemaker又想用cman,只能把cman当成corosync的插件来用)

二、常见高可用集群解决方案

  • heartbeat+crm

  • cman+rgmanager

  • cman+pacemaker

  • corosync+pacemaker(pacemaker作为资源管理器

三、安装corosync、pacemaker

1
#yum install -y corosync

其配置文件位于/etc/corosync/下,模板为corosync.conf.example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Please read the corosync.conf.5 manual page
compatibility: whitetank  #兼容08.以前的版本
totem {
     version: 2  #totme 的版本
     secauth: off  #安全认证是否打开,最好打开
     threads: 0  #用于安全认证开启并行线程数
     interface {
         ringnumber: 0  #环号码,如果一个主机有多块网卡,避免心跳信息回流
         bindnetaddr: 192.168.1.1  #网络地址(节点所在的网络地址)
         mcastaddr: 226.94.1.1  #广播地址
         mcastport: 5405     #多播占用的端口
         ttl: 1  #只向外一跳心跳信息,避免组播报文环路
     }
}
#totem定义集群内各节点间是如何通信的,totem本是一种协议,专用于corosync专用于各节点间的协议,协议是有版本的
logging {
     fileline: off
     to_stderr: no  #日志信息是否发往错误输出(否)
     to_logfile:  yes  #是否记录日志文件
     to_syslog:  yes  #是否记录于syslog日志-->此类日志记录于/var/log/message中
     logfile:  /var/log/cluster/corosync .log  #日志存放位置
     debug: off  #只要不是为了排错,最好关闭debug,它记录的信息过于详细,会占用大量的磁盘IO.
     timestamp: on  #记录日志的时间戳
     logger_subsys {
         subsys: AMF
         debug: off
     }
}
amf {
     mode: disabled
}

如果想让pacemker在corosync以一个插件来用,就要在corosync.conf文件写如下内容

1
2
3
4
5
6
7
8
9
10
service {
     ver:0
     name:pacemaker
}
#corosync启动后会自动启动pacemaker
aisexec {
     user :root
     group:root
}
#启用ais功能时以什么身份来运行,默认为root,aisexec区域也可以不写
第一步、生成密钥文件
以corosync-keygen命令来生成密钥(生成的密钥的算法为/dev/random随机生成)
生成密钥之会将在配置文件目录下自行生成一个authkey文件,将这两个文件复制到各集群节点上。

1
#scp -p authkey corosync.conf 192.168.1.111:/etc/corosync/

第二步、启动corosync

1
2
3
4
[root@essun corosync] # ssh essun.node2.com 'service corosync start'
Starting Corosync Cluster Engine (corosync): [  OK  ]
[root@essun corosync] # service corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]

查看日志信息,可以明显的看到corosync是否启动正常(在每一个节点上都要查看)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#tail -40 /var/log/cluster/corosync.log
Apr 25 23:12:01 [2811] essun.node3.com       crmd:     info: update_attrd:  Connecting to attrd... 5 retries remaining
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_replace:   Digest matched on replace from essun.node2.com: cb225a22df77f4f0bfbf7bd73c7d4160
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_replace:   Replaced 0.4.1 with 0.4.1 from essun.node2.com
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_replace operation  for  section  'all' : OK (rc=0, origin=essun.node2.com /crmd/24 , version=0.4.1)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Forwarding cib_delete operation  for  section  //node_state [@ uname = 'essun.node3.com' ] /transient_attributes  to master (origin= local /crmd/9 )
Apr 25 23:12:01 [2811] essun.node3.com       crmd:     info: do_log:    FSA: Input I_NOT_DC from do_cl_join_finalize_respond() received  in  state S_PENDING
Apr 25 23:12:01 [2811] essun.node3.com       crmd:   notice: do_state_transition:   State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Apr 25 23:12:01 [2809] essun.node3.com      attrd:   notice: attrd_local_callback:  Sending full refresh (origin=crmd)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: write_cib_contents:    Wrote version 0.3.0 of the CIB to disk (digest: 02ededba58f5938f53dd45f5bd06f577)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_apply_diff operation  for  section nodes: OK (rc=0, origin=essun.node2.com /crmd/26 , version=0.5.1)
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng:     info: cib_process_diff:  Diff 0.4.1 -> 0.5.1 from  local  not applied to 0.3.1: current  "epoch"  is  less  than required
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng:   notice: update_cib_cache_cb:   [cib_diff_notify] Patch aborted: Application of an update  diff  failed, requesting a full refresh (-207)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_apply_diff operation  for  section status: OK (rc=0, origin=essun.node2.com /crmd/29 , version=0.5.2)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_apply_diff operation  for  section status: OK (rc=0, origin=essun.node2.com /crmd/31 , version=0.5.3)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_query operation  for  section  'all' : OK (rc=0, origin= local /crmd/4 , version=0.5.3)
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng:     info: cib_process_diff:  Diff 0.5.1 -> 0.5.2 from  local  not applied to 0.5.3: current  "num_updates"  is greater than required
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng:   notice: update_cib_cache_cb:   [cib_diff_notify] Patch aborted: Application of an update  diff  failed (-206)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_query operation  for  section  'all' : OK (rc=0, origin= local /crmd/5 , version=0.5.3)
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng:     info: cib_process_diff:  Diff 0.5.2 -> 0.5.3 from  local  not applied to 0.5.3: current  "num_updates"  is greater than required
Apr 25 23:12:01 [2807] essun.node3.com stonith-ng:   notice: update_cib_cache_cb:   [cib_diff_notify] Patch aborted: Application of an update  diff  failed (-206)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_query operation  for  section  'all' : OK (rc=0, origin= local /crmd/6 , version=0.5.3)
Apr 25 23:12:01 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_apply_diff operation  for  section cib: OK (rc=0, origin=essun.node2.com /crmd/34 , version=0.5.4)
Apr 25 23:12:02 [2809] essun.node3.com      attrd:   notice: attrd_trigger_update:  Sending flush  op  to all hosts  for : probe_complete ( true )
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_query operation  for  section  //cib/status//node_state [@ id = 'essun.node3.com' ] //transient_attributes//nvpair [@name= 'probe_complete' ]: No such device or address (rc=-6, origin= local /attrd/2 , version=0.5.4)
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_query operation  for  section  /cib : OK (rc=0, origin= local /attrd/3 , version=0.5.4)
Apr 25 23:12:02 [2809] essun.node3.com      attrd:   notice: attrd_perform_update:  Sent update 4: probe_complete= true
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_apply_diff operation  for  section status: OK (rc=0, origin=essun.node2.com /attrd/4 , version=0.5.5)
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: cib_process_request:   Forwarding cib_modify operation  for  section status to master (origin= local /attrd/4 )
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_query operation  for  section  //cib/status//node_state [@ id = 'essun.node3.com' ] //transient_attributes//nvpair [@name= 'probe_complete' ]: No such device or address (rc=-6, origin= local /attrd/5 , version=0.5.5)
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_query operation  for  section  /cib : OK (rc=0, origin= local /attrd/6 , version=0.5.5)
Apr 25 23:12:02 [2809] essun.node3.com      attrd:   notice: attrd_perform_update:  Sent update 7: probe_complete= true
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: cib_process_request:   Forwarding cib_modify operation  for  section status to master (origin= local /attrd/7 )
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: retrieveCib:   Reading cluster configuration from:  /var/lib/pacemaker/cib/cib .dnz3rc (digest:  /var/lib/pacemaker/cib/cib .dOgpug)
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_apply_diff operation  for  section status: OK (rc=0, origin=essun.node2.com /attrd/4 , version=0.5.6)
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: write_cib_contents:    Archived previous version as  /var/lib/pacemaker/cib/cib-2 .raw
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: write_cib_contents:    Wrote version 0.5.0 of the CIB to disk (digest: 420e9390e2cb813eebbdf3bb73416dd2)
Apr 25 23:12:02 [2806] essun.node3.com        cib:     info: retrieveCib:   Reading cluster configuration from:  /var/lib/pacemaker/cib/cib .kgClFd (digest:  /var/lib/pacemaker/cib/cib .gQtyTi)
Apr 25 23:12:14 [2806] essun.node3.com        cib:     info: crm_client_new:    Connecting 0x1d8dc80  for  uid=0 gid=0 pid=2828  id =2dfaa45a-28c4-4c7e-9613-603fb1217e12
Apr 25 23:12:14 [2806] essun.node3.com        cib:     info: cib_process_request:   Completed cib_query operation  for  section  'all' : OK (rc=0, origin= local /cibadmin/2 , version=0.5.6)
Apr 25 23:12:14 [2806] essun.node3.com        cib:     info: crm_client_destroy:    Destroying 0 events

如果正常后,就可以使用crm status命令来查看当前集群节点信息了

1
2
3
4
5
6
7
8
9
[root@essun corosync] # crm status
Last updated: Fri Apr 25 23:18:11 2014
Last change: Fri Apr 25 23:12:01 2014 via crmd on essun.node2.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ essun.node2.com essun.node3.com ]

当前有两个节点在线,node2和node3

四、pacemaker资源管理器(CRM)命令注解

1、crm有两种工作方式
批处理模式
就是在命令行中直接输入命令(如上个命令执行时使用的crm status)
交互式模式(crm(live)#)

进入到crmsh中交互执行

2、crm命令介绍

  • 一级子命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@essun corosync] # crm
crm(live) # help
This is crm shell, a Pacemaker  command  line interface.
Available commands:
     cib              manage shadow CIBs  #cib沙盒
     resource         resources management  #所有的资源都在这个子命令后定义
     configure        CRM cluster configuration  #编辑集群配置信息
     node             nodes management  #集群节点管理子命令
     options          user preferences  #用户优先级
     history           CRM cluster  history #
     site             Geo-cluster support
     ra               resource agents information center  #资源代理子命令(所有与资源代理相关的程都在此命令之下)
     status           show cluster status  #显示当前集群的状态信息
     help,?           show help (help topics  for  list of topics) #查看当前区域可能的命令
     end, cd ,up        go back one level  #返回第一级crm(live)#
     quit,bye, exit     exit  the program  #退出crm(live)交互模式
  • resource子命令

    • 所有的资源状态都此处控制

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
crm(live)resource # help
vailable commands:
         status           show status of resources  #显示资源状态信息
         start            start a resource  #启动一个资源
         stop             stop a resource  #停止一个资源
         restart          restart a resource  #重启一个资源
         promote          promote a master-slave resource  #提升一个主从资源
         demote           demote a master-slave resource  #降级一个主从资源
         manage           put a resource into managed mode
         unmanage         put a resource into unmanaged mode
         migrate          migrate a resource to another node  #将资源迁移到另一个节点上
         unmigrate        unmigrate a resource to another node
         param            manage a parameter of a resource  #管理资源的参数
         secret           manage sensitive parameters  #管理敏感参数
         meta             manage a meta attribute  #管理源属性
         utilization      manage a utilization attribute
         failcount        manage failcounts  #管理失效计数器
         cleanup          cleanup resource status  #清理资源状态
         refresh          refresh CIB from the LRM status  #从LRM(LRM本地资源管理)更新CIB(集群信息库),在
         reprobe          probe  for  resources not started by the CRM  #探测在CRM中没有启动的资源
         trace            start RA tracing  #启用资源代理(RA)追踪
         untrace          stop RA tracing  #禁用资源代理(RA)追踪
         help             show help (help topics  for  list of topics)  #显示帮助
         end              go back one level  #返回一级(crm(live)#)
         quit              exit  the program  #退出交互式程序
  • configure子命令

    • 所有资源的定义都是在此子命令下完成的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
crm(live)configure # help
Available commands:
         node             define a cluster node  #定义一个集群节点
         primitive        define a resource  #定义资源
         monitor          add monitor operation to a primitive  #对一个资源添加监控选项(如超时时间,启动失败后的操作)
         group            define a group  #定义一个组类型(将多个资源整合在一起)
         clone            define a clone  #定义一个克隆类型(可以设置总的克隆数,每一个节点上可以运行几个克隆)
         ms               define a master-slave resource  #定义一个主从类型(集群内的节点只能有一个运行主资源,其它从的做备用)
         rsc_template     define a resource template  #定义一个资源模板
         location         a location preference  #定义位置约束优先级(默认运行于那一个节点(如果位置约束的值相同,默认倾向性那一个高,就在那一个节点上运行))
         colocation       colocate resources  #排列约束资源(多个资源在一起的可能性)
         order            order resources  #资源的启动的先后顺序
         rsc_ticket       resources ticket dependency
         property          set  a cluster property  #设置集群属性
         rsc_defaults      set  resource defaults  #设置资源默认属性(粘性)
         fencing_topology node fencing order  #隔离节点顺序
         role             define role access rights  #定义角色的访问权限
         user             define user access rights  #定义用用户访问权限
         op_defaults       set  resource operations defaults  #设置资源默认选项
         schema            set  or display current CIB RNG schema
         show             display CIB objects  #显示集群信息库对
         edit             edit CIB objects  #编辑集群信息库对象(vim模式下编辑)
         filter           filter CIB objects  #过滤CIB对象
         delete           delete CIB objects  #删除CIB对象
         default-timeouts  set  timeouts  for  operations to minimums from the meta-data
         rename           rename a CIB object  #重命名CIB对象
         modgroup         modify group  #改变资源组
         refresh          refresh from CIB  #重新读取CIB信息
         erase            erase the CIB  #清除CIB信息
         ptest            show cluster actions  if  changes were committed
         rsctest           test  resources as currently configured
         cib              CIB shadow management
         cibstatus        CIB status management and editing
         template         edit and  import  a configuration from a template
         commit           commit the changes to the CIB  #将更改后的信息提交写入CIB
         verify           verify the CIB with crm_verify  #CIB语法验证
         upgrade          upgrade the CIB to version 1.0
         save             save the CIB to a  file  #将当前CIB导出到一个文件中(导出的文件存于切换crm 之前的目录)
         load              import  the CIB from a  file  #从文件内容载入CIB
         graph            generate a directed graph
         xml              raw xml
         help             show help (help topics  for  list of topics)  #显示帮助信息
         end              go back one level  #回到第一级(crm(live)#)
         quit              exit  the program   #退出crm交互模式
  • node子命令

    • 节点管理和状态命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
crm(live)resource # cd ..
crm(live) # node
crm(live)node # help
Node management and status commands.
Available commands:
     status           show nodes status as XML  #以xml格式显示节点状态信息
     show             show node  #命令行格式显示节点状态信息
     standby          put node into standby  #模拟指定节点离线(standby在后面必须的FQDN)
     online            set  node online  # 节点重新上线
     maintenance      put node into maintenance mode
     ready            put node into ready mode
     fence            fence node  #隔离节点
     clearstate       Clear node state  #清理节点状态信息
     delete           delete node  #删除 一个节点
     attribute        manage attributes
     utilization      manage utilization attributes
     status-attr      manage status attributes
     help             show help (help topics  for  list of topics)
     end              go back one level
     quit              exit  the program
  • ra子命令

    • 资源代理类别都在此处

1
2
3
4
5
6
7
8
9
10
11
crm(live)node # cd ..
crm(live) # ra
crm(live)ra # help
Available commands:
         classes          list classes and providers  #为资源代理分类
         list             list RA  for  a class (and provider) #显示一个类别中的提供的资源
         meta             show meta data  for  a RA  #显示一个资源代理序的可用参数(如meta ocf:heartbeat:IPaddr2)
         providers        show providers  for  a RA and a class
         help             show help (help topics  for  list of topics)
         end              go back one level
         quit              exit  the program

注:

虽然这些命令所用的单词都很简单,但我还是将经常用得到的标注一下,虽然现在刚学完,记的比较清楚,但可能在以后的某一天对这里的某一个命令出现了盲区,岂不痛心疾首。(千万不要高估自己的记忆力,有时一个不小心就会骗了你!)

五、实例演示

注:

  • 配置高可用的前提

    • 时间同步

    • 无密码登录

    • 主机名解析

  • 此处只为了演示命令的使用,并非生产环境配置

1、本机环境

系统:

   centos 6.5 x86_64

节点:

   essun.node2.com   192.168.1.111

   essun.node3.com   192.168.1.108

各节点所需要的软件与资源

   虚拟ip 一个 192.168.1.100

   在两个节点上各安装上httpd服务,添加默认测试页,测试完成后禁止服务开机自动启动。

   挂载nfs资源,提供nfs的主机为192.168.1.110

2、定义资源

  • 禁用stonith-enable(如果不清楚有那些参数,可以使用按两下tab键对命令补全,使用cd ..可以反回到上一级命令)

1
2
3
4
5
6
7
8
9
10
11
12
13
crm(live)configure # property stonith-enabled=false #(假装故障的节点已经安全的关机了, 不启用stonith进行裁决)
crm(live)configure # verify #(此处没有信息就表示己经是正确操作)
crm(live)configure # commit #(此时就可以正常提交了)
crm(live)configure # show  #(显示己经提交且正在生效的属性信息)
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"
  • 忽略投票规则

1
2
3
crm(live)configure # property no-quorum-policy=ignore
crm(live)configure # verify
crm(live)configure # commit
  • 定义一个虚拟ip

1
2
3
4
5
6
7
8
9
10
11
12
13
crm(live)configure # primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.100
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"
注解:
以上语句可以分为四段
第一段:primitive:定义一资源所使用的命令
第二段:webip:为资源起一个名字
第三段;ocf:heartbeat;IPaddr:所使用资源代理的类别,由谁提供的那一个代理程序
(此处可以使用crm ra#list 后面跟上RA的四种类别来查看所使用的代理程序是由谁提供的)
第四段:params:指定定义的参数

ip:参数名

  • 定义一个文件系统挂载

先进入ra中查找文件系统所使用的资源代理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
crm(live)configure ra # classes
lsb
ocf / heartbeat pacemaker
service
stonith
crm(live)configure ra # list ocf
CTDB             ClusterMon       Delay            Dummy            Filesystem       HealthCPU
HealthSMART      IPaddr           IPaddr2          IPsrcaddr        LVM              MailTo
Route            SendArp          Squid            Stateful         SysInfo          SystemHealth
VirtualDomain    Xinetd           apache           conntrackd       controld         dhcpd
ethmonitor       exportfs         mysql            named            nfsserver        pgsql
ping              pingd            postfix          remote           rsyncd            symlink
crm(live)configure ra # providers Filesystem
heartbeat

由此可知文件系统的资源代理是由ocf:heartbeat提供

查看此资源代理可的参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
crm(live)configure ra # meta ocf:heartbeat:Filesystem
Manages filesystem mounts (ocf:heartbeat:Filesystem)
Resource script  for  Filesystem. It manages a Filesystem on a
shared storage medium.
The standard monitor operation of depth 0 (also known as probe)
checks  if  the filesystem is mounted. If you want deeper tests,
set  OCF_CHECK_LEVEL to one of the following values:
10:  read  first 16 blocks of the device (raw  read )
This doesn't exercise the filesystem at all, but the device on
which  the filesystem lives. This is noop  for  non-block devices
such as NFS, SMBFS, or bind mounts.
20:  test  if  a status  file  can be written and  read
The status  file  must be writable by root. This is not always the
case  with an NFS  mount , as NFS exports usually have the
"root_squash"  option  set . In such a setup, you must either use
read -only monitoring (depth=10),  export  with  "no_root_squash"  on
your NFS server, or grant world write permissions on the
directory where the status  file  is to be placed.
Parameters (* denotes required, [] the default):
device* (string): block device
     The name of block device  for  the filesystem, or -U, -L options  for  mount , or NFS  mount  specificatio
n.
directory* (string):  mount  point
     The  mount  point  for  the filesystem.
fstype* (string): filesystem  type
     The  type  of filesystem to be mounted.
...........省略中.......

此处带有*表示必须参数,现在我们就可以定义了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
crm(live)configure # primitive webnfs ocf:heartbeat:Filesystem params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs"  op monitor interval=60s timeout=60s op start timeout=60s op stop timeout=60s
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
     params device= "192.168.1.110:/share"  directory= "/var/www/html"  fstype= "nfs"  \
     op  monitor interval= "60s"  timeout= "60s"  \
     op  start timeout= "60s"  interval= "0"  \
     op  stop timeout= "60s"  interval= "0"
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"

注解:

primitive #定义资源命令

webnfs #资源ID

ocf:heartbeat:Filesystem # 资源代理(RA)

params device="192.168.1.110:/share" #共享目录

directory="/var/www/html" #挂载目录

fstype="nfs"   #文件类型

op monitor #对此webnfs做监控

interval=60s #间隔时间

timeout=60s #超时时间

op start timeout=60s #启动超时时间

op stop timeout=60s #停止超时时间

定义web服务资源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
crm(live)configure # primitive webserver lsb:httpd
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
     params device= "192.168.1.110:/share"  directory= "/var/www/html"  fstype= "nfs"  \
     op  monitor interval= "60s"  timeout= "60s"  \
     op  start timeout= "60s"  interval= "0"  \
     op  stop timeout= "60s"  interval= "0"
primitive webserver lsb:httpd
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"

将多个资源整全在一起(绑定在一起运行)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
crm(live)configure # group webservice webip webnfs webserver
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
     params device= "192.168.1.110:/share"  directory= "/var/www/html"  fstype= "nfs"  \
     op  monitor interval= "60s"  timeout= "60s"  \
     op  start timeout= "60s"  interval= "0"  \
     op  stop timeout= "60s"  interval= "0"
primitive webserver lsb:httpd
group webservice webip webnfs webserver
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"

换个方式查看一下己生效的资源信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
crm(live)configure# cd ..
crm(live)# status
Last updated: Sat Apr  26  01 : 51 : 45  2014
Last change: Sat Apr  26  01 : 49 : 54  2014  via cibadmin on essun.node3.com
Stack: classic openais ( with  plugin)
Current DC: essun.node2.com - partition  with  quorum
Version:  1.1 . 10 - 14 .el6_5. 3 -368c726
2  Nodes configured,  2  expected votes
3  Resources configured
Online: [ essun.node2.com essun.node3.com ]
  Resource Group: webservice
      webip  (ocf::heartbeat:IPaddr):    Started essun.node2.com
      webnfs (ocf::heartbeat:Filesystem):    Started essun.node2.com
      webserver  (lsb:httpd):    Started essun.node2.com

上图表示所有的资源都在node2上,也就是192.168.1.111这个ip上,使用curl命令访问一下,看一下效果

1
2
3
4
5
6
7
8
9
10
11
[root@bogon share] # ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0C:29:63:4A:25
           inet addr:192.168.1.110  Bcast:255.255.255.255  Mask:255.255.255.0
           inet6 addr: fe80::20c:29ff:fe63:4a25 /64  Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:2747 errors:0 dropped:0 overruns:0 frame:0
           TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:212090 (207.1 KiB)  TX bytes:99626 (97.2 KiB)
[root@bogon share] # curl http://192.168.1.111
来自于NFS文件系统

此时模拟node2节点故障,看资源会是否转移

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
crm(live)node # standby essun.node2.com
crm(live) # status
Last updated: Sat Apr 26 02:05:24 2014
Last change: Sat Apr 26 02:04:17 2014 via crm_attribute on essun.node3.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Node essun.node2.com: standby
Online: [ essun.node3.com ]
  Resource Group: webservice
      webip  (ocf::heartbeat:IPaddr):    Started essun.node3.com
      webnfs (ocf::heartbeat:Filesystem):    Started essun.node3.com
      webserver  (lsb:httpd):    Started essun.node3.com

再curl一次

1
2
3
4
5
6
[root@bogon share] # curl http://192.168.1.111
curl: (7) couldn't connect to host
[root@bogon share] # curl http://192.168.1.100
来自于NFS文件系统
[root@bogon share] # curl http://192.168.1.108
来自于NFS文件系统

注解:

第一次curl表示httpd服务己经不再节点node2上运行了

第二次curl表示我使用vip还是可能访问得到挂载页面,表示服务没有因node2下线而终止

第三次curl表示使用node3ip同样也能访问到服务,可能判断服务运行于node3上。

这时,如果node2重新上线服务是不会切换到node2上的,如果想让node2上线后可以切换回来可以使用位置约束来指定其权重

下面使用第二种方式来限定资源,先将组定义删除,可以在crm configure #edit 编辑cib文件,将组定义的条目删除即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
crm(live)node # online essun.node2.com
crm(live) # status
Last updated: Sat Apr 26 02:20:13 2014
Last change: Sat Apr 26 02:19:29 2014 via crm_attribute on essun.node2.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ essun.node2.com essun.node3.com ]
  Resource Group: webservice
      webip  (ocf::heartbeat:IPaddr):    Started essun.node3.com
      webnfs (ocf::heartbeat:Filesystem):    Started essun.node3.com
      webserver  (lsb:httpd):    Started essun.node3.com

服务果然没有回来,看我咋把它收回来的a_c!

第一步,删除组限定,最好的办法使用edit命令,同样也可使用命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
crm(live)resource # stop webservice #组别名
crm(live)configure # delete webservice #删除组别
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com \
     attributes standby= "off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
     params device= "192.168.1.110:/share"  directory= "/var/www/html"  fstype= "nfs"  \
     op  monitor interval= "60s"  timeout= "60s"  \
     op  start timeout= "60s"  interval= "0"  \
     op  stop timeout= "60s"  interval= "0"
primitive webserver lsb:httpd
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"  \
     no-quorum-policy= "ignore"  \
     last-lrm-refresh= "1398450597"

这时己经没有组别定义了,这样就可以进行我的“计划”了

定义排列约束(在一起的可能性)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
crm(live)configure # colocation webserver-with-webnfs-webip inf: webip webnfs webserver
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com \
     attributes standby= "off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
     params device= "192.168.1.110:/share"  directory= "/var/www/html"  fstype= "nfs"  \
     op  monitor interval= "60s"  timeout= "60s"  \
     op  start timeout= "60s"  interval= "0"  \
     op  stop timeout= "60s"  interval= "0"
primitive webserver lsb:httpd
colocation webserver-with-webnfs-webip inf: webip webnfs webserver
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"  \
     no-quorum-policy= "ignore"  \
     last-lrm-refresh= "1398450597"

注解:

colocation:排列约束命令

webserver-with-webnfs-webip: #约束名(ID)

inf:#(可能性,inf表示永久在一起,也可以是数值)

webip webnfs webserver:#资源名称

定义资源启动顺序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
crm(live)configure # order ip_before_webnfs_before_webserver mandatory: webip webnfs webserver
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com \
     attributes standby= "off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
     params device= "192.168.1.110:/share"  directory= "/var/www/html"  fstype= "nfs"  \
     op  monitor interval= "60s"  timeout= "60s"  \
     op  start timeout= "60s"  interval= "0"  \
     op  stop timeout= "60s"  interval= "0"
primitive webserver lsb:httpd
colocation webserver-with-webnfs-webip inf: webip webnfs webserver
order ip_before_webnfs_before_webserver inf: webip webnfs webserver
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"  \
     no-quorum-policy= "ignore"  \
     last-lrm-refresh= "1398450597"

注解:

order :顺序约束的命令

ip_before_webnfs_before_webserver #约束ID

mandatory: #指定级别(此处有三种级别:mandatory:强制, Optional:可选,Serialize:序列化)

webip webnfs webserver #资源名,这里书写的先后顺序相当重要

定义位置约束

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
crm(live)configure # location webip_and_webnfs_and_webserver webip 500: essun.node2.com
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com \
     attributes standby= "off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
     params device= "192.168.1.110:/share"  directory= "/var/www/html"  fstype= "nfs"  \
     op  monitor interval= "60s"  timeout= "60s"  \
     op  start timeout= "60s"  interval= "0"  \
     op  stop timeout= "60s"  interval= "0"
primitive webserver lsb:httpd
location webip_and_webnfs_and_webserver webip 500: essun.node2.com
colocation webserver-with-webnfs-webip inf: webip webnfs webserver
order ip_before_webnfs_before_webserver inf: webip webnfs webserver
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"  \
     no-quorum-policy= "ignore"  \
     last-lrm-refresh= "1398450597"

注解:

location:位置约束命令
webip_and_webnfs_and_webserver:约束名称
webip 500: essun.node2.com:对那一个资源指定多少权重在那一个节点

定义默认资源属性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
crm(live)configure # rsc_defaults resource-stickiness=100
crm(live)configure # verify
crm(live)configure # commit
crm(live)configure # show
node essun.node2.com \
     attributes standby= "off"
node essun.node3.com
primitive webip ocf:heartbeat:IPaddr \
     params ip= "192.168.1.100"
primitive webnfs ocf:heartbeat:Filesystem \
     params device= "192.168.1.110:/share"  directory= "/var/www/html"  fstype= "nfs"  \
     op  monitor interval= "60s"  timeout= "60s"  \
     op  start timeout= "60s"  interval= "0"  \
     op  stop timeout= "60s"  interval= "0"
primitive webserver lsb:httpd
location webip_and_webnfs_and_webserver webip 500: essun.node2.com
colocation webserver-with-webnfs-webip inf: webip webnfs webserver
order ip_before_webnfs_before_webserver inf: webip webnfs webserver
property $ id = "cib-bootstrap-options"  \
     dc -version= "1.1.10-14.el6_5.3-368c726"  \
     cluster-infrastructure= "classic openais (with plugin)"  \
     expected-quorum-votes= "2"  \
     stonith-enabled= "false"  \
     no-quorum-policy= "ignore"  \
     last-lrm-refresh= "1398450597"
rsc_defaults $ id = "rsc-options"  \
     resource-stickiness= "100"

注解:

这样定义代表集群中每一个资源的默认粘性,只有当资源服务不在当前节点时,粘性才会生效,比如,这里我定义了三个资源webip、webnfs、webserver,对每一个资源的粘性为100,那么加在一起就变成了300,之前己经定义node2的位置约束的值为500,当node2宕机后,重新上线,这样就切换到node2上了。

最后看一下状态,资源都运行于node2上,将node2故障

1
2
3
4
5
6
7
8
9
10
11
12
13
14
crm(live) # status
Last updated: Sat Apr 26 03:14:30 2014
Last change: Sat Apr 26 03:14:19 2014 via cibadmin on essun.node3.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ essun.node2.com essun.node3.com ]
  webip  (ocf::heartbeat:IPaddr):    Started essun.node2.com
  webnfs (ocf::heartbeat:Filesystem):    Started essun.node2.com
  webserver  (lsb:httpd):    Started essun.node2.com
crm(live) # node
crm(live)node # standby essun.node2.com

资源己在node3上运行了

1
2
3
4
5
6
7
8
9
10
11
12
13
crm(live) # status
Last updated: Sat Apr 26 03:18:17 2014
Last change: Sat Apr 26 03:15:20 2014 via crm_attribute on essun.node3.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Node essun.node2.com: standby
Online: [ essun.node3.com ]
  webip  (ocf::heartbeat:IPaddr):    Started essun.node3.com
  webnfs (ocf::heartbeat:Filesystem):    Started essun.node3.com
  webserver  (lsb:httpd):    Started essun.node3.com

再curl两次

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@bogon share] # ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0C:29:63:4A:25
           inet addr:192.168.1.110  Bcast:255.255.255.255  Mask:255.255.255.0
           inet6 addr: fe80::20c:29ff:fe63:4a25 /64  Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:2747 errors:0 dropped:0 overruns:0 frame:0
           TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:212090 (207.1 KiB)  TX bytes:99626 (97.2 KiB)
[root@bogon share] # curl http://192.168.1.100
来自于NFS文件系统
[root@bogon share] # curl http://192.168.1.108
来自于NFS文件系统
[root@bogon share] #

将node2重新上线看资源是否能回来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
crm(live)node # online essun.node2.com
crm(live)node # cd ..
crm(live) # status
Last updated: Sat Apr 26 03:21:46 2014
Last change: Sat Apr 26 03:21:36 2014 via crm_attribute on essun.node3.com
Stack: classic openais (with plugin)
Current DC: essun.node2.com - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ essun.node2.com essun.node3.com ]
  webip  (ocf::heartbeat:IPaddr):    Started essun.node2.com
  webnfs (ocf::heartbeat:Filesystem):    Started essun.node2.com
  webserver  (lsb:httpd):    Started essun.node2.com

再curl三次

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@bogon share] # ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0C:29:63:4A:25
           inet addr:192.168.1.110  Bcast:255.255.255.255  Mask:255.255.255.0
           inet6 addr: fe80::20c:29ff:fe63:4a25 /64  Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:2747 errors:0 dropped:0 overruns:0 frame:0
           TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:212090 (207.1 KiB)  TX bytes:99626 (97.2 KiB)
[root@bogon share] # curl http://192.168.1.100
来自于NFS文件系统
[root@bogon share] # curl http://192.168.1.108
curl: (7) couldn't connect to host
[root@bogon share] # curl http://192.168.1.111
来自于NFS文件系统
[root@bogon share] #

注解:

1.100是虚拟的集群IP

1.108为essun.node3.com

1.111为essun.node2.com

事实证明,资源还是夺回来了

=======================到此corosync+pacemaker的crmsh常用指令介绍完毕===========

PS:

   英文不好,可能注释的不够准确,各们看官请多多海涵a_c~~~~~~











本文转自 jinlinger 51CTO博客,原文链接:http://blog.51cto.com/essun/1403265,如需转载请自行联系原作者
目录
相关文章
|
5月前
|
监控 Ubuntu Linux
在Linux中,如何使用Pacemaker和Corosync?
在Linux中,如何使用Pacemaker和Corosync?
|
数据安全/隐私保护
ubuntu2004安装corosync和pacemaker并为集群添加浮动IP
ubuntu2004安装corosync和pacemaker并为集群添加浮动IP
425 0
|
Web App开发 网络协议 Linux