Introduction and Architecture of Setting Up a SUN Cluster
Introduction
The SUN Cluster Suite is also called SunPlex. I could not explain all here, and you should know well clustering basics.
Types of Clusters
There are 2 types of clusters:
- Cluster HA: High Availability Cluster
- Cluster HPC: High Performance Computing
Definitions
- Scalable: Add instances for the same application (e.g. Apache)
- Unaware applications: non-clustered applications
- SPOF: Single Point of Failure (e.g. Only one network card is on a node instead of two)
- RGM: Resources Group Manager
- Data Service: it’s an agent generated by RGM
- IPMP: It controls network failures (> SunPlex v3.0) (Similar to Bonding on Linux)
- DPM: Disk Path Monitoring (hard drive monitoring…works with I/O access, so if no traffic runs on the drive, no error can be found)
- Application Traffic Striping: Virtual private IP address (must be network class B < SunPlex v3.1)
- SCI: This is a shared memory technology for clusters. With this each node can view the memory of each other.
- Containers: Zones + RGM
- Amnesia: Amnesia occurs when a node is booting but can’t join the cluster because another node hasn’t authorized it. So it’s waiting for authorization.
Solaris Zone
A Solaris zone is a virtual zone that encapsulates a service (e.g. Apache). The advantage is that we can manage the maximum processor percentage or the maximum memory that the service can use. Many other features are available. A zone can be clustered > SunPlex v3.1; for older versions, you can cluster the global zone.
There is a Global zone which distributes a Virtual OS, but it’s not virtualization. It’s only to manage CPU, memory…
Cluster application creator
3 ways to build a service for a cluster:
- By hand in init.d
- SCDS Builder: this is a template generator, then edit by hand
- GDS: GUI for creating a service. User-friendly but not very performant
How to know my release
To know your release, you must identify the one you want. For Solaris release:
cat /etc/release
And for SunPlex release:
cat /etc/cluster/release
Infrastructure
Network
For a cluster infrastructure, you need network load balancing. For that, you need 2 network cards. Both cards (on each node) should be connected with an inverted cable. And both second ones should be connected to a switch.
This solution is applicable only if you have 2 nodes. If you have more, you must not connect with inverted cables. You must connect all your interfaces to redundant switches.
CMM
CMM (Cluster Membership Monitor) is able to rebuild node configurations. This could occur when node heartbeats are different.
The configuration repository is stored in the CCR. It contains:
- Cluster and node names
- Cluster transport configuration
- The names of registered VERITAS disk groups or Solaris Volume Manager software disksets
- A list of nodes that can master each disk group
- Information about NAS devices
- Data service operational parameter values (timeouts)
- Paths to data service callback methods
- Disk ID (DID) device configuration
- Current cluster status
Global File System
Global File System is also called GFS. This is for sharing filesystems with all the nodes. With GFS, you will have a new devices name folder called DID (Disk ID).
You can find it in:
/dev/did/sdk
/dev/did/rdsk
(directories)
For example, if one of my nodes doesn’t have a CD ROM, and I want to share it to be viewed like a device on the other nodes, I can do it with GFS. Or if I want to share the same partition of my NAS with all nodes it’s possible.
This command will mount one of my volumes on all nodes:
mount -o global,logging /dev/vx/dsk/nfs-dg/vol-01 /global/nfs
Adding a hardware device:
- You have to configure it to be recognized by the computer
- Then with devfsadm, you have to configure it for the OS
- Then, with sgdevs, you can configure it for the cluster.
As soon as we have a node which has no SAN access, we must use GFS to share files. By that way, the external nodes (regarding SAN) are able to connect to the SAN by using private network from the other SAN connected nodes. Buffered I/O will go on one other node after the timeout (in error case).
EFI
EFI means Extended File System. This enables partitions larger than some Terabytes to be recognized. The old one was called SMI.
Differences between SMI and EFI:
- SMI had a VTOC of 1 sector on the hard drive. EFI has a VTOC of 34 sectors.
- In a conventional partition type, S2 represents the disk’s total space, and only on EFI, S8 is reserved. S0 to S7 are available to make partitions.
When you launch format command:
format -e
You can choose between SMI or EFI.
LocalFS or Failover FileSystem
If you don’t want to use GFS, you can use LocalFS. It’s for passive/active cluster architecture.
Remotely Access
To access simultaneously to remote nodes, you can install one of these packages:
- cconsole: cluster console
- crlogin: for rlogin
- crtelnet: for telnet
Fencing
The fencing is the SCSI reservation in failure case. Each node of your cluster has a quorum of 1. The number of the quorum increases if there is a priority node due to another crash node.
For example, with a SAN, the good node will delete the other SCSI reservation to become the bigger one. As soon as the cluster is repaired, all the nodes will take back the “1” value.
The last switched off node must be the first switched on
For example:
- N2 stop
- N1 + Quorum +1
- – modifications –
- N1 stop
- N2 boot
- N2 boot failure because N1 has deleted SCSI reservation for N2
A calculation is available to know maximum devices:
n - 1 = max devices
n represents the number of nodes.
Last updated 11 Oct 2007, 06:31 CEST.