Installation and Configuration of a Heartbeat V2 Cluster
Introduction
Heartbeat is one of the most widely used cluster solutions because it's very flexible, powerful and stable. We'll see how to proceed with installation, configuration and service monitoring for Heartbeat 2. Compared to version 1, Heartbeat 2 can manage more than 2 nodes.
Unlike Heartbeat version 1, this solution cannot be set up in 30 minutes. You'll need to spend a few hours on it...
Installation
Via official repositories
Let's go, as usual, it's relatively simple:
Via external packages
For those who don't want to use Debian packages because they're not complete enough, go to https://download.opensuse.org/repositories/server:/ha-clustering/ and download the packages for your distribution (I took the Debian 64-bit ones, for example). Create a folder where you'll put all the packages and navigate to it, then:
This will install all packages and dependencies.
If you want to use the graphical interface, you'll also need to install these packages:
Perform this on both nodes. Here they are called:
- deb-node1
- deb-node2
hosts
To simplify the configuration and HA transactions, properly configure your /etc/hosts
file on all nodes:
A DNS server can also do the job!
NTP
Make sure all servers have the same time. It is therefore advisable to synchronize them on the same NTP server (e.g., ntpdate 0.pool.ntp.org)
sysctl.conf
We're going to modify the management of "core" files. Heartbeat recommends modifying the default configuration by adding the following line in /etc/sysctl.conf
:
Then apply this configuration:
Configuration
First, let's get the example configuration files:
Then we'll start editing the configuration files. Go to /etc/ha.d/
ha.cf
An example of the ha.cf configuration file can be found in /usr/share/doc/heartbeat/ha.cf.gz
.
Here is the content of the /etc/ha.d/ha.cf
file once modified:
authkeys
This file is used for information exchange between the different nodes. I can choose sha1 as the encryption algorithm, followed by my passphrase:
Don't forget to set the correct permissions on this file:
Replications and tests
Now that our files are ready, we'll upload them to the other nodes:
Now we just need to restart the nodes:
Now we'll check if everything works:
The number 5 corresponds to the number of seconds the monitor will try to check the cluster status. If it tries to connect continuously and you have no conclusive results, check your logs:
Here we encounter a problem that appears if heartbeat has started and kept an old configuration in memory. So stop all nodes and delete the contents of the /var/lib/heartbeat/crm
folder:
And now we'll look at the state of our cluster again:
Configuration of cluster services
The file that will interest us is /var/lib/heartbeat/crm/cib.xml
(alias CIB).
crm_config
-
default_resource_stickiness: Do you prefer to keep your service on the current node or move it to one that has more available resources? This option is equivalent to "auto_failback on" except that resources can move to nodes other than the one on which they were activated.
-
0: If the value is 0, resources will be automatically assigned.
- > 0: The resource will prefer to return to its original node, but it can also move if this node is not available. A higher value will reinforce the resource to stay where it currently is.
- < 0: The resource will prefer to move elsewhere than where it currently is. A higher value will cause this resource to move.
- INFINITY: The resource will always return to its initial place unless forced (node off, stand by...). This option is equivalent to "auto_failback off" except that resources can move to nodes other than the one on which they were activated.
-
-INFINITY: Resources will automatically go to another node.
-
options-transition_idle_timeout (60s by default): If no action has been detected during this time, the transition is considered to have failed. If each initialized operation has a higher timeout, then it will be taken into account.
-
symmetric_cluster (true by default): Specifies that resources can be launched on any node. Otherwise, you will need to create specific "constraints" for each resource.
-
no_quorum_policy (stop by default):
-
ignore: Pretends that we have a quorum
- freeze: Does not start any resources not present in our partition. Resources in our partition can be moved to another node with the partition (fencing disabled).
-
stop: Stops all resources activated in our partition (fencing disabled)
-
default_resource_failure_stickiness: Force a resource to migrate after a failure.
-
stonith_enabled (false by default): Failed nodes will be fenced.
-
stop_orphan_resources (true by default): If a resource is found with no definition.
- true: Stops the action
- false: Ignores the action
This affects more CRM's behavior when the resource is deleted by an admin without stopping it first.
- stop_orphan_actions (true by default): If a recursive action is found and there is no definition:
- true: Stops the action
- false: Ignores the action
This affects more CRM's behavior when the interval for a recurring action is modified.
-
remove_after_stop: This removes the resource from the "status" section of the CIB.
-
is_managed_default (true by default): Unless the resource definition says otherwise:
-
true: The resource will be started, stopped, monitored and moved if needed.
-
false: The resource will not be started if stopped, stopped if started, and not if it has scheduled recurring actions.
-
short_resource_names (false by default, true recommended): This option is for compatibility with versions prior to 2.0.2
nodes
Here we will define our nodes.
- id: It is automatically generated when heartbeat is started and changes depending on several criteria.
- uname: Node names.
- type: normal, member or ping.
If you don't know what to put because the ids haven't been generated yet, put this:
resources
- L26: Here we have created a group. I strongly recommend this to avoid making mistakes. The group is called "group_1".
- L27: Then we insert "IPaddr_192_168_0_90" as a name for the definition of an IP address. The type being "IPaddr".
- L29: We insert a monitoring operation to check every 5 sec if everything is going well with a timeout of 5 sec.
- L34: At the attributes level, we set the virtual IP address we want to use.
- L39: We define an instance that will allow Apache2 (not yet configured) and the IP address to work.
- L43: Now we create the primitive for Apache2
- L45: We will monitor Apache2 every 120s with a timeout of 60s
- L49: Apache2 must start on node2 by default
- L50: We indicate that group
- L51: The state of Apache2 must be started by default
The simplest way to avoid headaches is to use the Heartbeat GUI. It will do the configuration for you :-)
constraints
The constraints are used to indicate what should start before what. It's up to you to decide according to your needs.
Management of Cluster Services
To start, I strongly encourage you to consult this page.
To list services, here is the command:
Preparing a cluster service
Before continuing, make sure the configuration files of the two servers are identical, and that the services are stopped (here apache2):
Then make sure that the services managed by Heartbeat are no longer automatically started when Linux boots:
You can now type (on both servers):
After a few moments, the services should normally have started on the first machine, while the other is waiting.
Switching a service to another node
To switch, for example, our Apache2 service (apache2_2) to the second node (deb-node2):
If you want to switch a service from a node to itself, you'll get the following error:
Starting a service
Starting a cluster service is quite simple. Here we want to start Apache2 (apache2_2)
If you want to start Apache2 on the second node, you must first reallocate it before starting it documentation here. Ex:
This will switch apache2_2 which was running (so currently stopped) to the second node, but it won't start it. You'll need to run the second line to start it.
Adding an additional node
So here's the deal, there are several solutions. I'll give you the one that seems best to me. In the /etc/ha.d/ha.cf
file, check this (otherwise you'll have to reload your heartbeat configuration, which will cause a small outage):
This will authorize nodes that are not entered in the ha.cf file to connect.
For this, a minimum of security is required, which is why you still need to copy the ha.cf and authkeys files to deb-node3 (my new node). Note: First edit the ha.cf file to add the new node. This will allow you to have this node hardcoded at the next startup:
Now we send all this to the new node and don't forget to set the correct permissions:
And now the 3rd node joins the cluster:
And that's it, it's integrated without interruptions :-)
If you create a new machine by "rsync" from a machine in the cluster, you must delete the following files on the new machine:
FAQ
Clusterlab FAQ (Excellent site)