Building a redundant iSCSI and NFS cluster with Debian - Part 3 Sysadmin
This is part 3 of a series on building a redundant iSCSI and NFS SAN with Debian.
Part 1 - Overview, network layout and DRBD installation Part 2 - DRBD and LVM Part 3 - Heartbeat and automated failover Part 4 - iSCSI and IP failover Part 5 - Multipathing and client configuration Part 6 - Anything left over!
Introduction
In the last two guides, we set up a DRBD resource and LVM volume group which we could manually migrate between the two cluster nodes. In this guide, we'll set up the Heartbeat cluster software to handle automatic migration of services between the two nodes in our cluster (“failover”).
The version of Heartbeat included in Debian Etch is 1.x. It is a very simple system, and is limited to two node clusters, making it ideal for something simple such as failover for services between two nodes. The current 2.x branch is a lot more complicated, and has a new XML configuration format, although it can still be used with the original 1.x format files. Although it adds many useful features, it's overkill for our needs at the moment - plus, sticking to 1.x avoids the need to install software not included in the current stable distribution.
Preparation
Before we set up Heartbeat, we'll need to ensure the communication channels the cluster will be using are configured. If you refer back to the original network diagram, you'll see that we're using two different interconnects: A serial cable, and a network connection across eth1. To recap: the reason for this is that the interconnects are vital to the functioning of the cluster. If one node cannot “see” the other, it will assume control of the resources. If this was due to a faulty interconnect (or, a network misconfiguration), you would end up with a “split-brain” scenario in which both nodes try to gain control over the resources. At best, this would lead to service outages and confusion; at worst, you could be facing total data loss.
Hence, the two channels for cluster communication - the null-modem serial cable is a great “fallback” channel, which should always be available even if you do something like apply an erroneous firewall rule blocking the communication over eth1. If you have been following the instructions up until now, you should already be able to send data between the hosts over the serial connection, and ping each node from the other over their eth1 interfaces (we've already been using this interface for the DRBD synching). Assuming this all works, you're good to proceed.
Installation
Simply install from apt-get on both nodes :
# apt-get install heartbeat
This will give an error warning at the end (“Heartbeat not configured”), which you can ignore. You now need to setup authentication for both nodes - this is very simple, and just uses a shared secret key.
Create /etc/ha.d/authkeys on both systems with the following content:
auth 1 1 sha1 secret
In this sample file, the auth 1 directive says to use key number 1 for signing outgoing packets. The 1 sha1… line describes how to sign the packets. Replace the word “secret” with the passphrase of your choice.
As this is stored in plaintext, make sure that it is owned by root and has a restrictive set of permissions on it :
# chown root:root /etc/ha.d/authkeys # chmod 600 /etc/ha.d/authkeys
Make sure that copies of this file are identical across both nodes, and don't have any blank lines etc. in them.
Now, we need to set up the global cluster configuration file. Create the /etc/ha.d/ha.cf file on both nodes as follows :
# Interval in seconds between heartbeat packets keepalive 1 # How long to wait in seconds before deciding node is dead deadtime 10 # How long to wait in seconds before warning node is dead warntime 5 # How long to wait in seconds before deciding node is dead # When heartbeat is first started initdead 60 # If using serial port for heartbeat baud 9600 serial /dev/ttyS0 # If using network for heartbeat udpport 694 # eth1 is our dedicated cluster link (see diagram in part 1) bcast eth1 # Don't want to auto failback, let admin check and do it manually if needed auto_failback off # Nodes in our cluster node otter node weasel
Resources
We now need to tell Heartbeat about what resources we want it to manage. This is configured in the /etc/ha.d/haresources file. The format for this is again very simple - it just takes the form :
<hostname> resource[::arg1:arg2:arg3:........:argN]
Resources can either be one of the supplied scripts in /etc/ha.d/resource.d :
# ls /etc/ha.d/resource.d
AudibleAlarm db2 Delay drbddisk Filesystem ICP IPaddr IPaddr2 IPsrcaddr
IPv6addr LinuxSCSI LVM LVSSyncDaemonSwap MailTo OCF portblock SendArp ServeRAID
WAS WinPopup Xinetd
Or, they can be one of the init scripts in /etc/init.d, and Heartbeat will search those locations in that order.
To start with, we'll want to move the DRBD resource we configured in part 2 between the two nodes. This can be accomplished via the “drbddisk” script, provided by the drbd0.7-utils package. The configuration /etc/ha.d/haresources file should therefore look like the following :
weasel drbddisk::r0
This says that the node “weasel” should be the preferred node for this service. The resource script is “drbddisk”, which can be found under /etc/ha.d/resource.d, and we're passing it the argument “r0”, which is our DRBD resource configured in part 2.
To test this out, make the DRBD resource secondary by running the following on both nodes :
# drbdadm secondary r0
And then start the cluster on both nodes :
# /etc/init.d/heartbeat start
Starting High-Availability services:
Done.
Once they've started up, check the cluster status using the cl_status tool. First, let's check which nodes Heartbeat thinks are in the cluster :
# cl_status listnodes
weasel
otter
Now, check both nodes are up :
# cl_status nodestatus weasel active # cl_status nodestatus otter active
We can also use the cl_status tool to see which cluster links are available (which should be eth1 and /dev/ttyS0) :
# cl_status listhblinks otter eth1 /dev/ttyS0 # cl_status hblinkstatus otter eth1 up # cl_status hblinkstatus otter /dev/ttyS0 up
And we can also use it to check which resources each node has :
[root@otter] # cl_status rscstatus none [root@weasel] # cl_status rscstatus all
You should be able to check the output of /proc/drbd on both systems and see that r0 has been made the master on weasel. To failover to otter, simply restart the Heartbeat services on weasel :
# /etc/init.d/heartbeat restart
Stopping High-Availability services:
Done.
Waiting to allow resource takeover to complete:
Done.
Starting High-Availability services:
Done.
Now, check /proc/drbd and you should see that it is now the master on otter. You can confirm this with cl_status :
[root@otter] # cl_status rscstatus all [root@weasel] # cl_status rscstatus none
If you want to try a more dramatic approach, try yanking the power out of otter. You should see output similar to the following appear in /var/log/ha-log on weasel :
heartbeat: 2009/02/03_15:06:29 info: Resources being acquired from otter. heartbeat: 2009/02/03_15:06:29 info: acquire all HA resources (standby). heartbeat: 2009/02/03_15:06:29 info: Acquiring resource group: weasel drbddisk::r0 heartbeat: 2009/02/03_15:06:29 info: Local Resource acquisition completed. ... ... heartbeat: 2009/02/03_15:06:29 info: all HA resource acquisition completed (standby). heartbeat: 2009/02/03_15:06:29 info: Standby resource acquisition done [all]. heartbeat: 2009/02/03_15:06:29 info: Running /etc/ha.d/rc.d/status status heartbeat: 2009/02/03_15:06:29 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired ... ... heartbeat: 2009/02/03_15:06:39 WARN: node otter: is dead heartbeat: 2009/02/03_15:06:39 info: Dead node otter gave up resources.
Play around with this a few times, and make sure you're familiar with your resource moving between systems. Once you're happy with this, we'll add our LVM volume group into the configuration. Edit the /etc/ha.d/haresources file, and modify it so that it looks like the following :
weasel drbddisk::r0 \ LVM::storage
The backslash (\) character just tells Heartbeat that this should all be treated as one resource group - the same as a backslash indicates a line continuation in a shell script. Things can be on just one line, but I find it easier to read when it's split up like this.
Restart Heartbeat on each node in turn, and you should then be able to see the DRBD resource and the LVM volume group move between systems. The next step will cover setting up an iSCSI target, and adding that into the cluster configuration along with a group of managed IP addresses.