一、Heartbeat概念
Heartbeat是Linux-HA项目中的一个组件,也是目前开源HA项目中最成功的一个例子, Linux-HA的全称是High-Availability Linux,这个开源项目的目标是:通过社区开发者的共同努力,提供一个增强linux可靠性(reliability)、可用性(availability)和可服务性(serviceability)(RAS)的群集解决方案.Heartbeat提供了所有 HA 软件所需要的基本功能,比如心跳检测和资源接管、监测群集中的系统服务、在群集中的节点间转移共享 IP 地址的所有者等.
Heartbeat官方站点:
二、准备工作
1、Heartbeat网络架构
2、操作系统
1
2
3
|
CentOS 6.4 X86-64 最小化安装
由于用源码编译安装heartbeat一直没有通过,所以没办法只能采用yum安装。
heartbeat v3
|
3、地址规划
1
2
3
4
|
node1 192.168.0.101 255.255.255.0 192.168.0.1 node1.
test
.com eth1 Active
node2 192.168.0.102 255.255.255.0 192.168.0.1 node2.
test
.com eth1 Passive
node3 192.168.0.103 255.255.255.0 192.168.0.1 node3.
test
.com eth1 nfs
vip 192.168.0.200 255.255.255.0
|
4、主机名解析
1
2
3
4
5
6
7
|
[root@node1 ~]
# uname -n
node1.
test
.com
[root@node1 ~]
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.101 node1.
test
.com node1
192.168.0.102 node2.
test
.com node2
|
1
2
3
4
5
6
7
|
[root@node2 ~]
# uname -n
node2.
test
.com
[root@node2 ~]
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.101 node1.
test
.com node1
192.168.0.102 node2.
test
.com node2
|
5、双机互信
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
[root@node1 ~]
# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
Generating public
/private
rsa key pair.
Created directory
'/root/.ssh'
.
Your identification has been saved
in
/root/
.
ssh
/id_rsa
.
Your public key has been saved
in
/root/
.
ssh
/id_rsa
.pub.
The key fingerprint is:
ce:f3:d7:63:10:9b:d2:86:f8:8a:5a:ee:41:d8:d2:01 root@node1.
test
.com
The key's randomart image is:
+--[ RSA 2048]----+
| E |
| . |
| . |
| + . . |
| o + S. o + |
| o o. o * |
| o +. o o |
| o o o. . + |
| .o+ .... . . |
+-----------------+
[root@node1 ~]
# ssh-copy-id -i .ssh/id_rsa.pub root@node2.test.com
The authenticity of host
'node2.test.com (192.168.0.102)'
can't be established.
RSA key fingerprint is 46:b9:7c:11:db:75:93:ad:f1:26:f0:a7:4d:00:40:20.
Are you sure you want to
continue
connecting (
yes
/no
)?
yes
Warning: Permanently added
'node2.test.com,192.168.0.102'
(RSA) to the list of known hosts.
root@node2.
test
.com's password:
Now try logging into the machine, with
"ssh 'root@node2.test.com'"
, and check
in
:
.
ssh
/authorized_keys
to
make
sure we haven
't added extra keys that you weren'
t expecting.
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
[root@node2 ~]
# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
Generating public
/private
rsa key pair.
Your identification has been saved
in
/root/
.
ssh
/id_rsa
.
Your public key has been saved
in
/root/
.
ssh
/id_rsa
.pub.
The key fingerprint is:
c4:e3:71:f8:82:09:f0:42:9c:e7:20:db:db:ce:
dc
:0b root@node2.
test
.com
The key's randomart image is:
+--[ RSA 2048]----+
| .o. |
|..+o. . . |
| +.+o * . |
|. .... = = |
| o o S . |
| . . . |
| +E. |
| +.. |
| .. |
+-----------------+
[root@node2 ~]
# ssh-copy-id -i .ssh/id_rsa.pub root@node1.test.com
The authenticity of host
'node1.test.com (192.168.0.101)'
can't be established.
RSA key fingerprint is 46:b9:7c:11:db:75:93:ad:f1:26:f0:a7:4d:00:40:20.
Are you sure you want to
continue
connecting (
yes
/no
)?
yes
Warning: Permanently added
'node1.test.com,192.168.0.101'
(RSA) to the list of known hosts.
root@node1.
test
.com's password:
Now try logging into the machine, with
"ssh 'root@node1.test.com'"
, and check
in
:
.
ssh
/authorized_keys
to
make
sure we haven
't added extra keys that you weren'
t expecting.
|
6、时间同步
1
2
|
# yum -y install ntpdate
# ntpdate asia.pool.ntp.org
|
7、关闭防火墙
1
2
3
4
|
# getenforce
Disabled
# /etc/init.d/iptables status
iptables:未运行防火墙。
|
三、安装heartbeat包
1、node1和node2节点安装epel源
1
2
3
|
# wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
# wget http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
# rpm -Uvh remi-release-6*.rpm epel-release-6*.rpm
|
2、修改epel源的配置文件
1
2
|
# sed -i 's/#baseurl/baseurl/g' /etc/yum.repos.d/epel.repo
# sed -i 's/mirrorlist/#mirrorlist/' /etc/yum.repos.d/epel.repo
|
3、安装heartbeat包
1
|
# yum install heartbeat heartbeat-libs
|
4、查看heartbeat所依赖的包
四、配置Heartbeat服务
1、heartbeat配置文件的介绍
1
2
3
4
|
heartbeat3个配置文件
authkeys
#节点之间认证的秘钥key文件,权限为600
ha.cf
#heartbeat服务核心配置文件
haresources
#集群资源管理器(haresource | crm)
|
2、拷贝heartbeat初始配置文件
1
|
[root@node1 ~]
# cp /usr/share/doc/heartbeat-3.0.4/{ha.cf,authkeys,haresources} /etc/ha.d/
|
3、编辑authkeys文件
1
2
3
4
5
6
7
8
9
10
|
[root@node1 ~]
# dd if=/dev/random bs=512 count=1 | openssl md5 #生成密钥随机数
记录了0+1 的读入
记录了0+1 的写出
72字节(72 B)已复制,4.8467e-05 秒,1.5 MB/秒
(stdin)= acf7401e6b20d4cec482ba1160eb8efe
[root@node1 ~]
# vim /etc/ha.d/authkeys
#注释:末尾添加以下两行
auth 1
1 md5 acf7401e6b20d4cec482ba1160eb8efe
[root@node1 ~]
# chmod 600 /etc/ha.d/authkeys
|
4、编辑ha.cf主配置文件
1
2
3
4
5
6
7
|
[root@node1 ha.d]
# grep -v '^#' ha.cf |sed '/^$/d'
注释:主要修改两处,其它的都可以默认
logfacility local0
mcast eth1 225.100.100.100 694 1 0
#修改心跳信息的传播方式|组播
auto_failback on
node node1.
test
.com
#配置集群中的节点数
node node2.
test
.com
#配置集群中的节点数
|
5、编辑haresources配置文件
1
2
|
[root@node1 ha.d]
# grep -v '^#' /etc/ha.d/haresources
node1.
test
.com IPaddr::192.168.0.200
/24/eth1
httpd
|
6、拷贝配置文件到node2节点
1
|
[root@node1 ~]
# scp /etc/ha.d/{ha.cf,haresources,authkeys} root@node2.test.com:/etc/ha.d/
|
五、节点提供httpd服务
1、安装httpd包
1
|
[root@node1 ~]
# yum -y install httpd
|
2、提供测试页面
1
|
[root@node1 ~]
# echo "<h1>node1.test.com</h1>" > /var/www/html/index.html
|
3、启动httpd服务
1
|
[root@node1 ~]
# service httpd start
|
4、浏览器访问web页面
注释:测试完成后关闭服务,并让其开机不启动,httpd由heartbeat(haresource)管理
5、停止httpd服务,设置开机不启动httpd服务
1
2
3
4
5
|
[root@node1 ~]
# service httpd stop
停止 httpd: [确定]
[root@node1 ~]
# chkconfig httpd off
[root@node1 ~]
# chkconfig --list httpd
httpd 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭
|
6、节点2同上操作
1
2
3
4
5
|
[root@node2 ~]
# yum -y install httpd
[root@node2 ~]
# echo "<h1>node2.test.com</h1>" > /var/www/html/index.html
[root@node2 ~]
# service httpd start
[root@node2 ~]
# service httpd stop
[root@node2 ~]
# chkconfig httpd off
|
7、访问节点2的httpd服务测试页面
六、启动heartbeat服务
1、启动heartbeat服务
1
2
3
4
5
6
7
|
[root@node1 ~]
# /etc/init.d/heartbeat start
Starting High-Availability services: INFO: Resource is stopped
Done.
[root@node1 ~]
# ssh node2 "/etc/init.d/heartbeat start"
Starting High-Availability services: 2014
/12/25_21
:09:12 INFO: Resource is stopped
Done.
|
2、查看heartbeat日志
1
|
[root@node1 ~]
# tail -f /var/log/message
|
3、查看vip信息
1
2
3
4
5
6
7
8
9
10
11
12
|
[root@node1 ~]
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link
/loopback
00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1
/8
scope host lo
inet6 ::1
/128
scope host
valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link
/ether
00:0c:29:c7:14:97 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.101
/24
brd 192.168.0.255 scope global eth1
inet 192.168.0.200
/24
scope global eth1
#vip已经成功绑定在eth1的网卡上
inet6 fe80::20c:29ff:fec7:1497
/64
scope link
valid_lft forever preferred_lft forever
|
4、查看httpd服务是否被heartbeat接管
1
2
|
[root@node1 ~]
# netstat -tnlpu |grep httpd
tcp 0 0 :::80 :::* LISTEN 2140
/httpd
|
5、浏览器访问测试
1
2
|
[root@node1 ~]
# sh /usr/share/heartbeat/hb_standby
Going standby [all].
|
7、查看node1节点的日志信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
[root@node1 ~]
# tail -f /var/log/messages
Dec 25 21:36:11 node1 heartbeat: [1255]: info: node1.
test
.com wants to go standby [all]
Dec 25 21:36:11 node1 heartbeat: [1255]: info: standby: node2.
test
.com can take our all resources
Dec 25 21:36:11 node1 heartbeat: [1701]: info: give up all HA resources (standby).
Dec 25 21:36:11 node1 ResourceManager(default)[1714]: info: Releasing resource group: node1.
test
.com IPaddr::192.168.0.200
/24/eth1
httpd
Dec 25 21:36:11 node1 ResourceManager(default)[1714]: info: Running
/etc/init
.d
/httpd
stop
Dec 25 21:36:11 node1 ResourceManager(default)[1714]: info: Running
/etc/ha
.d
/resource
.d
/IPaddr
192.168.0.200
/24/eth1
stop
Dec 25 21:36:11 node1 IPaddr(IPaddr_192.168.0.200)[1789]: INFO: IP status = ok, IP_CIP=
Dec 25 21:36:11 node1
/usr/lib/ocf/resource
.d
//heartbeat/IPaddr
(IPaddr_192.168.0.200)[1763]: INFO: Success
Dec 25 21:36:11 node1 heartbeat: [1701]: info: all HA resource release completed (standby).
Dec 25 21:36:11 node1 heartbeat: [1255]: info: Local standby process completed [all].
Dec 25 21:36:12 node1 heartbeat: [1255]: WARN: 1 lost packet(s)
for
[node2.
test
.com] [425:427]
Dec 25 21:36:12 node1 heartbeat: [1255]: info: remote resource transition completed.
Dec 25 21:36:12 node1 heartbeat: [1255]: info: No pkts missing from node2.
test
.com!
Dec 25 21:36:12 node1 heartbeat: [1255]: info: Other node completed standby takeover of all resources.
|
8、注释说明
1
|
node1节点由Active切换到Passive后,httpd服务停止,vip有node1转移到node2上
|
9、查看node2节点
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
[root@node2 ~]
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link
/loopback
00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1
/8
scope host lo
inet6 ::1
/128
scope host
valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link
/ether
00:0c:29:ad:9f:36 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.102
/24
brd 192.168.0.255 scope global eth1
inet 192.168.0.200
/24
brd 192.168.0.255 scope global secondary eth1
inet6 fe80::20c:29ff:fead:9f36
/64
scope link
valid_lft forever preferred_lft forever
[root@node2 ~]
# netstat -tnlp |grep httpd
tcp 0 0 :::80 :::* LISTEN 2709
/httpd
|
10、vip转移后再次访问
到此处,最基本最简单的heartbeat服务的高可用就完成了。
七、Heartbeat的共享存储
1、配置node3的NFS服务
1
2
3
4
5
6
7
8
9
10
|
[root@node3 ~]
# yum -y install nfs-utils rpcbind
[root@node3 ~]
# mkdir /web/htdocs -p
[root@node3 ~]
# cat /etc/exports
/web/htdocs
192.168.0.0
/24
(ro)
[root@node3 ~]
# /etc/init.d/rpcbind start
[root@node3 ~]
# /etc/init.d/nfs start
[root@node3 ~]
# showmount -e '192.168.0.103'
Export list
for
192.168.0.103:
/web/htdocs
192.168.0.0
/24
[root@node3 ~]
# echo "<h1>node3 nfs server</h1>" > /web/htdocs/index.html
|
2、节点挂载测试
node1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
[root@node1 ~]
# mount -t nfs 192.168.0.103:/web/htdocs /mnt/
[root@node1 ~]
# ll /mnt/
总用量 4
-rw-r--r-- 1 nobody nobody 26 12月 25 21:53 index.html
[root@node1 ~]
# cat /mnt/index.html
<h1>node3 nfs server<
/h1
>
[root@node1 ~]
# df
文件系统 1K-块 已用 可用 已用% 挂载点
/dev/mapper/VolGroup-lv_root
16134560 1395740 13919212 10% /
tmpfs 247208 0 247208 0%
/dev/shm
/dev/sda1
495844 32418 437826 7%
/boot
192.168.0.103:
/web/htdocs
16134560 1302528 14012416 9%
/mnt
[root@node1 ~]
# umount /mnt/
|
node2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
[root@node2 ~]
# mount -t nfs 192.168.0.103:/web/htdocs /mnt/
[root@node2 ~]
# ll /mnt/
总用量 4
-rw-r--r-- 1 nobody nobody 26 12月 25 21:53 index.html
[root@node2 ~]
# cat /mnt/index.html
<h1>node3 nfs server<
/h1
>
[root@node2 ~]
# df
文件系统 1K-块 已用 可用 已用% 挂载点
/dev/mapper/VolGroup-lv_root
16134560 1416796 13898156 10% /
tmpfs 247208 0 247208 0%
/dev/shm
/dev/sda1
495844 32418 437826 7%
/boot
192.168.0.103:
/web/htdocs
16134560 1302528 14012416 9%
/mnt
[root@node2 ~]
# umount /mnt/
|
3、停止node1和node2节点的heartbeat服务
1
2
3
4
5
|
[root@node1 ~]
# ssh node2 'service heartbeat stop'
Stopping High-Availability services: Done.
[root@node1 ~]
# service heartbeat stop
Stopping High-Availability services: Done.
|
4、修改haresource配置文件
1
2
|
[root@node1 ~]
# vim /etc/ha.d/haresources
node1.
test
.com IPaddr::192.168.0.200
/24/eth1
Filesystem::192.168.0.103:
/web/htdocs
::
/var/www/html
::nfs httpd
|
5、拷贝修改后的haresource配置文件到node2
1
|
[root@node1 ~]
# scp /etc/ha.d/haresources root@node2.test.com:/etc/ha.d/
|
6、启动节点的heartbeat服务
1
2
3
4
5
6
7
|
[root@node1 ~]
# service heartbeat start
Starting High-Availability services: INFO: Resource is stopped
Done.
[root@node1 ~]
# ssh node2 "service heartbeat start"
Starting High-Availability services: 2014
/12/25_22
:01:40 INFO: Resource is stopped
Done.
|
7、浏览器测试访问
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
[root@node1 ~]
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link
/loopback
00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1
/8
scope host lo
inet6 ::1
/128
scope host
valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link
/ether
00:0c:29:c7:14:97 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.101
/24
brd 192.168.0.255 scope global eth1
inet 192.168.0.200
/24
brd 192.168.0.255 scope global secondary eth1
inet6 fe80::20c:29ff:fec7:1497
/64
scope link
valid_lft forever preferred_lft forever
[root@node1 ~]
# netstat -tnlp |grep httpd
tcp 0 0 :::80 :::* LISTEN 3301
/httpd
[root@node1 ~]
# df
文件系统 1K-块 已用 可用 已用% 挂载点
/dev/mapper/VolGroup-lv_root
16134560 1395756 13919196 10% /
tmpfs 247208 0 247208 0%
/dev/shm
/dev/sda1
495844 32418 437826 7%
/boot
192.168.0.103:
/web/htdocs
16134560 1302528 14012416 9%
/var/www/html
[root@node1 ~]
# cat /var/www/html/index.html
<h1>node3 nfs server<
/h1
>
|
9、停止node1节点的heartbeat服务,进行切换,并查看vip信息,再次访问vip地址
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
[root@node1 ~]
# service heartbeat stop
Stopping High-Availability services: Done.
[root@node1 ~]
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link
/loopback
00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1
/8
scope host lo
inet6 ::1
/128
scope host
valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link
/ether
00:0c:29:c7:14:97 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.101
/24
brd 192.168.0.255 scope global eth1
inet6 fe80::20c:29ff:fec7:1497
/64
scope link
valid_lft forever preferred_lft forever
[root@node1 ~]
# netstat -tnlp |grep httpd
|
八、测试Heartbeat高可用
正常关闭和重启主节点的heartbeat服务 或者 脚本切换主节点为备用节点
1
2
3
|
细节流程:
正常关闭Heartbeat服务:
/etc/init
.d
/heartbeat
stop && service heartbeat stop
脚本切换主节点为备用节点:sh
/usr/share/heartbeat/hb_standby
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
在主节点node1上关闭heartbeat服务执行“service heartbeat stop”,正常关闭主节点的heartbeat服务进程。此时主节点通过“ip addr”命令查看主节点的网卡信息,正常情况下,
应该可以看到主节点已经释放了集群服务的ip(vip)地址,同时释放了挂载磁盘的共享分区,并且httpd服务处于停止状态。
然后登陆备用节点执行“
ssh
node2'”查看备用节点node2相关属性信息,在备用节点node2上用“ip addr”命令查看集群ip(vip)是否已经被接管,同时是否已经挂载上了共享磁盘分区,
并且httpd服务是否已经启动;得出的结论就是备用节点已经接管了vip地址,共享磁盘分区已经被挂载,httpd服务已经启动。
在这个过程中,使用
ping
命令对集群服务ip(vip)进行测试,可以看到集群服务ip一直处于可通状态,并没有任何延迟和堵塞现象,也就是说在正常关闭主节点node1上的heartbeat服务
的情况下,主备节点的切换时无缝的,HA对外提供的服务可以不间断运行。
接着,主节点的Heartbeat服务正常启动,那么备用节点的集群服务ip(vip)将被释放,同时卸载挂载的共享磁盘分区和停止httpd服务,反而主节点将再次接管集群服务ip(vip)和
挂载共享磁盘分区,其实备用节点释放资源与主节点绑定资源是同步进行的。因而,这个过程也是一个无缝切换。
但是大家需要注意的是主节点重新上线后,在进行
ping
测试的过程中会有一次中断,不过是瞬间的 影响不是很大。
|
到此,heartbeat的web高可用就完成了。后续会继续补充Heartbeat对mysql服务的高可用!