I'm the resident Linux guru at my job -- a mid-sized local company with a decent sized IT department. We like to install servers in clusters to improve our fault tolerance. Being the Linux guy in a shop where Windows servers outnumber Unix server about 8:1, I wanted to do a one up on Windows' active-passive (high availability, or HA) cluster setup by doing a 2-node active-active (load balanced, or LB) cluster using the Linux Virtual Server (LVS) system. Our Linux distribution of choice is RedHat Enterprise Linux 4 (RHEL 4), and CentOS is the most compatible free clone thereof. Version 4 of these distros uses the Linux 2.6 kernel.
I was able to find a number of good tutorials on the web for configuring similar platforms, but nothing that quite matched what we wanted to do. Hence, I'm writing one now.
For these examples, let's assume that you have two physical web servers named lvs1
(192.168.0.1) and lvs2
(192.168.0.2) that you want to cluster together. They sit on a class C network, with a gateway router of 192.168.0.254. Those machines are known as the "real servers," since they are the ones that do the real work of serving up web pages. The outside world will reference those servers using a single hostname of vip1
(192.168.0.100). Either or both real servers will answer requests made to vip1
. The determination of which real server will answer each request is made by the "ldirectord" package. In a larger setup, ldirectord would run on its own HA pair of servers, but in our 2-node setup, it jumps back and forth between the two real servers. The jumping back and forth (in case one director server completely dies) is handled by the "heartbeat" package.
The first step is to download all the necessary packages. All of them could be built from source, but I prefer to use RPM packages when available because they allow you to manage versions and dependancies much more easily. Since LVS doesn't officially ship with RHEL, the best place to get recent packages seems to be from the CentOS respository at ftp://ftp.osuosl.org/pub/centos/4.4/extras/i386/RPMS/ or directly from the Linux-HA web site at http://linux-ha.org/download/index.html. There is a bug in the IPaddr2 script in all 2.x versions prior to 2.0.8, so until 2.0.8 makes it into the repositories, you'll have to apply this patch (relative to v2.0.7) to /usr/lib/ocf/resource.d/heartbeat/IPaddr2
.
The exact package list required will vary depending on what's already installed on your system. At a minimum, you will need the following packages. The indenting indicates the package dependancies; ie, most packages exist to support heartbeat and heartbeat-ldirectord.
- heartbeat
- heartbeat-pils
- heartbeat-stonith
- heartbeat-ldirectord
- ipvsadm
- perl-MailTools
- perl-TimeDate
- perl-Net-IMAP-Simple
- perl-Net-IMAP-Simple-SSL
- perl-IO-Socket-SSL
- perl-Net-SSLeay
- perl-IO-Socket-SSL
- perl-Mail-POP3Client
- perl-Mail-IMAPClient
- perl-Authen-Radius
- perl-Data-HexDump
Once the necessary packages are installed, you can start the configuration process. There's a pretty good writeup for installing Ultra Monkey in a 2-node HA/LB setup on RHEL3 or Debian here. I had a couple problems with that on RHEL4, though, which is why I'm writing my own tutorial.
First, you need to change a few kernel parameters by editing /etc/sysctl.conf
. Ensure that the following variables are all set to the following values. Beware that some of them may be set to other values somewhere in the file, while others won't exist yet at all. These settings prevent the servers from advertising via ARP the VIP address that will later be assigned to each localhost interface. They also allow the machine acting as the director to forward packets to the other real server when necessary.
#======================================================================== # UltraMonkey requirements below # # Enable configuration of arp_ignore option net.ipv4.conf.all.arp_ignore = 1 # When an arp request is received on eth0, only respond if that address is # configured on eth0. In particular, do not respond if the address is # configured on lo net.ipv4.conf.eth0.arp_ignore = 1 # Ditto for eth1, add for all ARPing interfaces #net.ipv4.conf.eth1.arp_ignore = 1 # Enable configuration of arp_announce option net.ipv4.conf.all.arp_announce = 2 # When making an ARP request sent through eth0 Always use an address that # is configured on eth0 as the source address of the ARP request. If this # is not set, and packets are being sent out eth0 for an address that is on # lo, and an arp request is required, then the address on lo will be used. # As the source IP address of arp requests is entered into the ARP cache on # the destination, it has the effect of announcing this address. This is # not desirable in this case as adresses on lo on the real-servers should # be announced only by the linux-director. net.ipv4.conf.eth0.arp_announce = 2 # Ditto for eth1, add for all ARPing interfaces #net.ipv4.conf.eth1.arp_announce = 2 # Enables packet forwarding net.ipv4.ip_forward = 1 # # UltraMonkey requirements above #========================================================================
To make these changes take effect, either reboot the system or run:
# /sbin/sysctl -p
Next, you need to configure the loopback interface to have an alias for the VIP address so that the real servers will know to answer connections on that IP even when they're not acting as the director. Create a file named "/etc/sysconfig/network-scripts/ifcfg-lo:0
" that contains IP information for the VIP and its network:
DEVICE=lo:0 IPADDR=192.168.0.100 NETMASK=255.255.255.255 NETWORK=192.168.0.0 BROADCAST=192.168.0.255 ONBOOT=yes NAME=loopback
To turn on this new alias, run:
# /sbin/ifup lo
or
# service network start
This alias won't show up when running "ifconfig
", a fact that caused me to waste several hours tracking down a problem that didn't even exist. Instead, you can verify its existance by running:
# ip addr sh lo 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo inet 192.168.0.100/32 brd 192.168.0.255 scope global lo:0 inet6 ::1/128 scope host valid_lft forever preferred_lft forever
Because we're using what's known as LVS-DR (direct routing), you need to make sure that the default gateway for the servers' primary network interface points to the proper gateway router rather than to the director. To do this, check for the GATEWAY
entry in either "/etc/sysconfig/network
" or "/etc/sysconfig/network-scripts/ifcfg-eth0
" and ensure that it lists the proper IP:
# cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=lvs1 GATEWAY=192.168.0.254
or
# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=static ONBOOT=yes TYPE=Ethernet IPADDR=192.168.0.1 NETMASK=255.255.255.0 GATEWAY=192.168.0.254
You can verify this by running:
# ip route show 0/0 default via 192.168.0.254 dev eth0
Now it's time to configure the heartbeat package to handle failover of the VIP and ldirectord package. There are three files in "/etc/ha.d
" that must be configured to make things work. Each of these files should be identical on the two real servers. The packages will install default config files full of comments, but here are a reasonable set of configuration parameters. Everywhere you see a hostname listed, it must match the output of "uname -n
" on the appropriate server. The "authkeys
" file must be readable only by root for security purposes.
# cat /etc/ha.d/ha.cf logfacility local0 keepalive 1 deadtime 10 warntime 5 initdead 120 udpport 694 mcast eth0 225.0.0.1 694 1 0 auto_failback off node lvs2.mydomain.com node lvs3.mydomain.com ping 192.168.0.254 respawn hacluster /usr/lib/heartbeat/ipfail crm off
# cat /etc/ha.d/authkeys auth 2 2 sha1 ThisIsMyPassword
# cat /etc/ha.d/haresources lvs1.mydomain.com / ldirectord::ldirectord.cf / LVSSyncDaemonSwap::master / IPaddr2::192.168.0.100/24/eth0/192.168.0.255
# cat /etc/ha.d/ldirectord.cf checktimeout=15 checkinterval=5 autoreload=no logfile="/var/log/ldirectord.log" quiescent=no virtual=192.168.0.100:80 fallback=127.0.0.1:80 real=192.168.0.1:80 gate real=192.168.0.2:80 gate service=http request="ldirectord.html" receive="It worked" scheduler=rr persistent=600 protocol=tcp checktype=negotiate
The above files should be the same on both hosts. ldirectord.cf above is configured to check for a web server on port 80 which contains a file in the root directory named ldirectord.html
containing only the string "It worked
". Ldirectord checks the health of each real server by querying each web server for that file. If it gets back a file containing the receive string, it considers the server willing and able to receive public requests. There are builtin check mechanisms for serveral other popular services, too.
Now you need to make sure that heartbeat is started at boot time and that ldirectord is NOT started at boot by running this on both servers:
/sbin/chkconfig heartbeat on /sbin/chkconfig ldirectord off /sbin/service ldirectord stop /sbin/service heartbeat start
You also need to ensure that your user services (httpd, mysql, etc) are running before you turn on heartbeat. Give it a minute to startup and stabilize, then check that things are running by typing:
lvs1# ip addr sh 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 192.168.0.255 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:50:56:8a:01:10 brd ff:ff:ff:ff:ff:ff inet 192.168.0.1/18 brd 192.168.0.255 scope global eth0 inet 192.168.0.100/18 brd 192.168.0.255 scope global secondary eth0 inet6 fe80::250:56ff:fe8a:110/64 scope link valid_lft forever preferred_lft forever lvs2# ip addr sh 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo inet 192.168.0.100/32 brd 192.168.0.255 scope global lo:0 inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:50:56:8a:1f:39 brd ff:ff:ff:ff:ff:ff inet 192.168.0.2/18 brd 192.168.0.255 scope global eth0 inet6 fe80::250:56ff:fe8a:1f39/64 scope link valid_lft forever preferred_lft forever
The first node you started up (the active director, lvs1 in this example) should have the VIP on eth0, while the second node you started should have it on lo. You can now run ipvsadm
to check the status of the nodes and any incoming connections. Only the machine currently acting as director will list any useful info:
lvs2# ipvsadm -L -n IP Virtual Server version 1.2.0 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn lvs1# ipvsadm -L -n IP Virtual Server version 1.2.0 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.0.100:80 rr persistent 600 -> 192.168.0.2:80 Route 1 0 0 -> 192.168.0.1:80 Local 1 0 0
You can see above in the "weight" column that incoming requests will be split equally between the two real servers. If you stop the HTTP daemon on one of othe servers, within a few seconds the weight for that server will drop top zero, and no more new requests will be directed to that server. To allow existing connections to finish politely while sending all new connections to the other box (if you're about to do some planned maintenance, for example), set the weight of the dying server to zero with the first command below. In order to make new connections from persistent hosts make the transition, you must set "quiescent=no
" in ldirectord.cf
. With "quiescent=yes
", persistent hosts will continue trying to hit the dying server even after it dies, on the assumption that it will eventually come back.
# /sbin/ipvsadm -e -t 192.168.0.100:80 -r 192.168.0.2:80 -w 0 # /sbin/ipvsadm -L -n IP Virtual Server version 1.2.0 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.0.100:80 rr persistent 600 -> 192.168.0.2:80 Route 0 0 0 -> 192.168.0.1:80 Local 1 0 0
If you don't want to remember that first ipvsadm
command, you can (de)activate individual real services using this init script. Run "service cluster stop lvs2
" to set the weight for lvs2 to zero. Determining the other functionality is left as an exercise for the reader.