Introduction et Architechture de mise en place d'un Cluster SUN
The SUN Cluster Suite is also called SunPlex. I could not explain all here, and you should know well clustering bases.
1.1 Types of Clusters
The are 2 types of clusters :
- Cluster HA : Cluster of High Disponibility
- Cluster HPC : High Performance Computing
- Scalable : Add instances for the same application (eg. Apache)
- Unaware applications : no clusterized applications
- SPOF : Single Point of Failure (eg. Only one network card is on a node instead of two)
- RGM : Ressources Group Manager
- Data Service : it's an agent generated by RGM
- IPMP : It controls network failures (> SunPlex v3.0) (Bonding on Linux)
- DPM : Disk Path Monitoring (hard drive monitoring...works with I/O access, so if no traffic runs on the drive, no error can be fouund)
- Application Traffic Striping : Virtual private adress IP (must be network class B < SunPlex v3.1)
- SCI : This is a shared memory technology for clusters. With this each node can view the memory of each others.
- Containers : Zones + RGM
- Amnesia : Amnesia is when a node is booting, can't join the cluster because an other node didn't autorize it. So it's waiting the autorization.
1.3 Solaris Zone
A Solaris zone is a virtual zone that encapsule a service (eg. Apache). The advantage of this is we can manage the max processor percentage or the max of memory that the service can take. Many others features are availables. A zone can be clusterized > SunPlex v3.1, for the old one, you can clusterize the global zone.
There is a Global zone which distribute a Virtual OS like, but it's not virtualization. It's only to manage cpu, menmory...
1.4 Cluster application creator
3 ways to build a service for a cluster :
- By hand in init.d
- SCDS Builder : this is a templater, then edit by hand
- GDS : Gui for creating his service. User Friendly but not very performant
1.5 How to know my release
To know your release, you must identify the one you want. For Solaris release :
And for SunPlex release :
For a cluster infrastructure, you need network load balancing. For that, you need 2 network cards. The both (nodes side) one should be connected with an inverted cable. And the both second one should be connected on a switch.
This solution is applicable only if you have 2 nodes. If you have more, you musn't connect inverted cable. You must connect all your interfaces to redudant switches.
CMM (Cluster Membership Monitor) is able to rebuild configurations's nodes. This could occure when nodes hearthbeat are dfferents.
The configuration repository is stocked in the CCR. It contains :
- Cluster and node naes
- Cluster transport configuration
- The names of registred VERITAS disk groups or Solaris Volume Manager software disksets
- A list of nodes that ca master each disk group
- Information about NAS devices
- Data servic operational parameter values (timeouts)
- Paths to data service calback methods
- Disk ID (DID) device configuration
- Current cluster status
2.3 Global File System
Global File System is also called GFS. This is for sharing filesystems with all the nodes. With GFS, you will have a new devices name folder called DID (Disk ID).
You can find it in :
- /dev/did/rdsk (directories)
Eg. If one of my node doesn't have a CD ROM, and I want to share it to be viewed like a device on the other nodes, I can do it with GFS. Or if I want to share the same partition of my NAS with all nodes it's possible.
This command will mount one of my volume on all nodes :
mount -o global,logging /dev/vx/dsk/nfs-dg/vol-01 /global/nfs
Adding a hardware device :
- You have to configure it to be reconized by the computer
- The with "devfsadm", you have to configure it for the OS
- Then, with "sgdevs", you can configure it for the cluster.
As soon as we have a node which has no SAN access, we must use GFS to share files. By that way the external nodes (regarding SAN) is able to connect to the SAN by using private network from the other SAN connected nodes. Bufferized I/O will go on one other node after the timeout (in error case).
EFI means Extended File System. This enable partitions more than some TeraOctets to be reconized. The old one was called SMI.
Differences between SMI and EFI :
- SMI had a VTOC of 1 sector on the hard drive. EFI has a VTOC of 34 sectors.
- In a convention partition type, S2 looks like disk totality space, and only on EFI, S8 is reserved. S0 to S7 are available to make partitions.
When you launch format command :
You can choose beetween SMI or EFI.
3 LocalFS or Failover FileSystem
If you don't wand to use GFS, you can use LocalFS. It's for passif/actif cluster architechture.
4 Remotly Access
To access simultaneously to remote nodes, you can install one of those packages :
- cconsole : cluster console
- crlogin : for rlogin
- crtelnet : for telnet
The fencing is the SCSI reservation in falure case. Each node of your cluster has a quorum of 1. The number of the quorum increase if there is a priority node due to another crash node.
For example, with a SAN, the good node will delete the other SCSI reservation to be the bigger. As soon as the cluster will be repaired, all the nodes will took back the "1" value.
The last switched off node must be the first switched on
- N2 stop
- N1 + Quorum +1
- -- modifications --
- N1 stop
- N2 boot
- N2 boot failure because N1 has deleted SCSI reservation for N2
A calcul is available to know maximum devices :
n - 1 = max devices
n represent the number of nodes.