The private interfaces (node1, node2 and node3) will be used for the heartbeat part and therefore dedicated to the cluster. Public interfaces should be used for the rest (direct connections, VIP...). So when configuring your cluster, use the private interfaces for its creation.
Bonding
IMPORTANT: The bonding of private interfaces (cluster heartbeat) can only be configured in mode 1! Only this mode is accepted.
If you are on RHEL 6, create the following line in /etc/modprobe.d/bonding.conf:
Generally, you will use a disk array so that data can be accessed from any machine. For this, you will need to use multipathing. Follow this documentation to set it up.
Firewall
If you use a firewall for the interconnection part of your nodes, here is the list of ports:
Service name
Port/Protocol
cman
5404/udp, 5405/udp
ricci
11111/tcp
gnbd
14567/tcp
modclusterd
16851/tcp
dlm
21064/tcp
ccsd
50006/tcp, 50007/udp, 50008/tcp, 50009/tcp
Note: It is strongly advised against having a software firewall on this part and strongly recommended to have a dedicated switch!
Installation
Cluster
On all your nodes, install cman and all your dependencies will be installed at once:
You may need to look at the "preferred_names" parameter if you're using multipathing. Be careful with WWNs, if you use them, you must configure this correctly:
...
# If several entries in the scanned directories correspond to the# same block device and the tools need to display a name for device,# all the pathnames are matched against each item in the following# list of regular expressions in turn and the first match is used.# preferred_names = [ ]# Try to avoid using undescriptive /dev/dm-N names, if present.preferred_names=["^/dev/mpath/","^/dev/mapper/mpath","^/dev/[hs]d"]...
Let's quickly create a volume on a node and you can check that it appears everywhere. In this example, multipath is not used. So we're going to create a partition and give it an LVM tag. On all nodes, refresh the partition table on the disk where the new partition was created:
Our volume is now available on all nodes. Caution: this does not mean you will be able to write to it simultaneously. To do this kind of operation, you will need to use GFS2.
GFS2
If you need to have a filesystem shared across all nodes, you will need to install and use GFS2.
Configuration
Quorum
The Quorum is an important element in the design of a High Availability cluster. It is mandatory on a single-node cluster and optional beyond. It must be an element common to the nodes (a partition of a disk array for example) and it is used to place locks to guarantee the general proper state of the cluster.
To avoid split brain situations, the weight must be at least half of the nodes + 1.
(expected_votes / 2) + 1
By default 1 node has a weight of 1. The "Expected votes" is the number of weights needed for the cluster to function normally. We will play with this kind of thing when there are application dependencies on other applications not on the same machines. To avoid problems, we will increase the weight of the nodes on the machines.
Expected votes
To see the number of votes required for the cluster to function properly:
The Quorum disk (qdisk) is only necessary on a 2-node cluster. Beyond that, it's not necessary and is not recommended! Qdisk is used by cman and ccsd and must be a partition/LUN, and in no case a logical volume.
Configuration
For a good configuration, I invite you to look at these links:
So there will be a race to the fencing, to see who will reboot the other one first. All this to avoid splitbrains.
Fencing
Fencing is mandatory, because a problematic node can corrupt data on mounted partitions. It is therefore preferable that the other nodes fence (yes, that's the verb ;)) the node that may cause problems. For this there will be a fenced daemon managed by cman. The fenced agents are stored in /sbin/fence_*.
For SCSI, there is a "scsi_reserve" service that allows to generate a unique key and create records for each machine. Any node can therefore delete the registration that has been made to prevent a problematic machine from continuing to write to SCSI devices.
We can see if it is possible to do it or not (here no because I use iscsi which does not support it):
If your fencing is not done via RACs (Remote Access Card), and you are using Xen or KVM for example, you will need to resort to software fencing by directly calling the hypervisor to shoot down a node. Let's see how to proceed.
We install cman on the host machine if your cluster nodes are on a virtual machine:
Be careful if you are using GFS, to completely shut down your cluster, you need to lower the "quorum expected" value to avoid problems when you shut down your nodes one by one. Otherwise you will not be able to unmount your GFS at all!
Solution 1 (recommended):
For example, on a 3-node cluster, your quorum is at 3 for example, lower your quorum:
Then you can unmount everything and shut down your nodes.
Solution 2 (avoid):
You can run the following command which will take each node out of the cluster and therefore we can properly shut them down, but there is a risk of splitbrain:
We need to add a mandatory fencing method to be able to shoot down a node in case of a problem, so that it can reintegrate the cluster as quickly as possible and does not corrupt data on the disks.
For clustering with NFS, it's a bit special, you need:
A file system
NFS export: monnfs
NFS client:
Name: monnfs
Target: the IPs allowed to connect
Options: ro/rw
As dependencies, here's what it should look like:
FS
NFS Export
NFS Client
rgmanager
The cluster scripts (resource types) available are in OCF (Open Cluster Framework) format and the scripts are available in /usr/share/cluster/.
When you create a service (Resource Group), you can create multiple resources and associate them or create dependencies on each other to define a startup order. You can retrieve the predefined information related to resource types in /usr/share/cluster/service.sh
By default it tries to restart the service on the same node, but there is also:
relocate: it will try to switch it to another node
disable: takes no actions in case of problems
Check Status
The status must be performed on each resource, which is by default at 30 seconds. You should absolutely not go below 5s. You can modify all these default values in /usr/share/cluster/*.
You can change the check interval and the timeout if you want one.
Custom scripts
You can develop your own scripts, and they must be able to respond to start, stop, restart and status. Additionally, return codes must be handled correctly.
>snmpwalk-v1-cpubliclocalhostREDHAT-CLUSTER-MIB:RedHatCluster
REDHAT-CLUSTER-MIB::rhcMIBVersion.0=INTEGER:1REDHAT-CLUSTER-MIB::rhcClusterName.0=STRING:"cluster1"REDHAT-CLUSTER-MIB::rhcClusterStatusCode.0=INTEGER:4REDHAT-CLUSTER-MIB::rhcClusterStatusDesc.0=STRING:"Some services not running"REDHAT-CLUSTER-MIB::rhcClusterVotesNeededForQuorum.0=INTEGER:2REDHAT-CLUSTER-MIB::rhcClusterVotes.0=INTEGER:3REDHAT-CLUSTER-MIB::rhcClusterQuorate.0=INTEGER:1REDHAT-CLUSTER-MIB::rhcClusterNodesNum.0=INTEGER:3REDHAT-CLUSTER-MIB::rhcClusterNodesNames.0=STRING:"node1.deimos.fr, node2.deimos.fr, node3.deimos.fr"REDHAT-CLUSTER-MIB::rhcClusterAvailNodesNum.0=INTEGER:3REDHAT-CLUSTER-MIB::rhcClusterAvailNodesNames.0=STRING:"node1.deimos.fr, node2.deimos.fr, node3.deimos.fr"REDHAT-CLUSTER-MIB::rhcClusterUnavailNodesNum.0=INTEGER:0REDHAT-CLUSTER-MIB::rhcClusterUnavailNodesNames.0=""REDHAT-CLUSTER-MIB::rhcClusterServicesNum.0=INTEGER:1REDHAT-CLUSTER-MIB::rhcClusterServicesNames.0=STRING:"webby"REDHAT-CLUSTER-MIB::rhcClusterRunningServicesNum.0=INTEGER:0REDHAT-CLUSTER-MIB::rhcClusterRunningServicesNames.0=""REDHAT-CLUSTER-MIB::rhcClusterStoppedServicesNum.0=INTEGER:1REDHAT-CLUSTER-MIB::rhcClusterStoppedServicesNames.0=STRING:"webby"REDHAT-CLUSTER-MIB::rhcClusterFailedServicesNum.0=INTEGER:0REDHAT-CLUSTER-MIB::rhcClusterFailedServicesNames.0=""REDHAT-CLUSTER-MIB::rhcNodeName."node1.deimos.fr"=STRING:"node1.deimos.fr"REDHAT-CLUSTER-MIB::rhcNodeName."node2.deimos.fr"=STRING:"node2.deimos.fr"REDHAT-CLUSTER-MIB::rhcNodeName."node3.deimos.fr"=STRING:"node3.deimos.fr"REDHAT-CLUSTER-MIB::rhcNodeStatusCode."node1.deimos.fr"=INTEGER:0REDHAT-CLUSTER-MIB::rhcNodeStatusCode."node2.deimos.fr"=INTEGER:0REDHAT-CLUSTER-MIB::rhcNodeStatusCode."node3.deimos.fr"=INTEGER:0REDHAT-CLUSTER-MIB::rhcNodeStatusDesc."node1.deimos.fr"=STRING:"Participating in cluster"REDHAT-CLUSTER-MIB::rhcNodeStatusDesc."node2.deimos.fr"=STRING:"Participating in cluster"REDHAT-CLUSTER-MIB::rhcNodeStatusDesc."node3.deimos.fr"=STRING:"Participating in cluster"REDHAT-CLUSTER-MIB::rhcNodeRunningServicesNum."node1.deimos.fr"=INTEGER:0REDHAT-CLUSTER-MIB::rhcNodeRunningServicesNum."node2.deimos.fr"=INTEGER:0REDHAT-CLUSTER-MIB::rhcNodeRunningServicesNum."node3.deimos.fr"=INTEGER:0REDHAT-CLUSTER-MIB::rhcNodeRunningServicesNames."node1.deimos.fr"=""REDHAT-CLUSTER-MIB::rhcNodeRunningServicesNames."node2.deimos.fr"=""REDHAT-CLUSTER-MIB::rhcNodeRunningServicesNames."node3.deimos.fr"=""REDHAT-CLUSTER-MIB::rhcServiceName."webby"=STRING:"webby"REDHAT-CLUSTER-MIB::rhcServiceStatusCode."webby"=INTEGER:1REDHAT-CLUSTER-MIB::rhcServiceStatusDesc."webby"=STRING:"stopped"REDHAT-CLUSTER-MIB::rhcServiceStartMode."webby"=STRING:"manual"REDHAT-CLUSTER-MIB::rhcServiceRunningOnNode."webby"=""EndofMIB
>servicecmanstart
Startingcluster:
Checkingifclusterhasbeendisabledatboot...[OK]CheckingNetworkManager...[OK]Globalsetup...[OK]Loadingkernelmodules...[OK]Mountingconfigfs...[OK]Startingcman...Can't determine address family of nodenameUnable to get the configurationCan'tdetermineaddressfamilyofnodename
cman_tool:corosyncdaemondidn'tstartCheckclusterlogsfordetails