On this page

Introduction and Architecture of Setting Up a SUN Cluster

Introduction

The SUN Cluster Suite is also called SunPlex. I could not explain all here, and you should know well clustering basics.

Types of Clusters

There are 2 types of clusters:

Cluster HA: High Availability Cluster
Cluster HPC: High Performance Computing

Definitions

Scalable: Add instances for the same application (e.g. Apache)
Unaware applications: non-clustered applications
SPOF: Single Point of Failure (e.g. Only one network card is on a node instead of two)
RGM: Resources Group Manager
Data Service: it’s an agent generated by RGM
IPMP: It controls network failures (> SunPlex v3.0) (Similar to Bonding on Linux)
DPM: Disk Path Monitoring (hard drive monitoring…works with I/O access, so if no traffic runs on the drive, no error can be found)
Application Traffic Striping: Virtual private IP address (must be network class B < SunPlex v3.1)
SCI: This is a shared memory technology for clusters. With this each node can view the memory of each other.
Containers: Zones + RGM
Amnesia: Amnesia occurs when a node is booting but can’t join the cluster because another node hasn’t authorized it. So it’s waiting for authorization.

Solaris Zone

A Solaris zone is a virtual zone that encapsulates a service (e.g. Apache). The advantage is that we can manage the maximum processor percentage or the maximum memory that the service can use. Many other features are available. A zone can be clustered > SunPlex v3.1; for older versions, you can cluster the global zone.

There is a Global zone which distributes a Virtual OS, but it’s not virtualization. It’s only to manage CPU, memory…

Cluster application creator

3 ways to build a service for a cluster:

By hand in init.d
SCDS Builder: this is a template generator, then edit by hand
GDS: GUI for creating a service. User-friendly but not very performant

How to know my release

To know your release, you must identify the one you want. For Solaris release:

  cat /etc/release

And for SunPlex release:

  cat /etc/cluster/release

Infrastructure

Network

For a cluster infrastructure, you need network load balancing. For that, you need 2 network cards. Both cards (on each node) should be connected with an inverted cable. And both second ones should be connected to a switch.

This solution is applicable only if you have 2 nodes. If you have more, you must not connect with inverted cables. You must connect all your interfaces to redundant switches.

CMM

CMM (Cluster Membership Monitor) is able to rebuild node configurations. This could occur when node heartbeats are different.

The configuration repository is stored in the CCR. It contains:

Cluster and node names
Cluster transport configuration
The names of registered VERITAS disk groups or Solaris Volume Manager software disksets
A list of nodes that can master each disk group
Information about NAS devices
Data service operational parameter values (timeouts)
Paths to data service callback methods
Disk ID (DID) device configuration
Current cluster status

Global File System

Global File System is also called GFS. This is for sharing filesystems with all the nodes. With GFS, you will have a new devices name folder called DID (Disk ID).

You can find it in:

/dev/did/sdk
/dev/did/rdsk (directories)

For example, if one of my nodes doesn’t have a CD ROM, and I want to share it to be viewed like a device on the other nodes, I can do it with GFS. Or if I want to share the same partition of my NAS with all nodes it’s possible.

This command will mount one of my volumes on all nodes:

  mount -o global,logging /dev/vx/dsk/nfs-dg/vol-01 /global/nfs

Adding a hardware device:

You have to configure it to be recognized by the computer
Then with devfsadm, you have to configure it for the OS
Then, with sgdevs, you can configure it for the cluster.

As soon as we have a node which has no SAN access, we must use GFS to share files. By that way, the external nodes (regarding SAN) are able to connect to the SAN by using private network from the other SAN connected nodes. Buffered I/O will go on one other node after the timeout (in error case).

EFI

EFI means Extended File System. This enables partitions larger than some Terabytes to be recognized. The old one was called SMI.

Differences between SMI and EFI:

SMI had a VTOC of 1 sector on the hard drive. EFI has a VTOC of 34 sectors.
In a conventional partition type, S2 represents the disk’s total space, and only on EFI, S8 is reserved. S0 to S7 are available to make partitions.

When you launch format command:

  format -e

You can choose between SMI or EFI.

LocalFS or Failover FileSystem

If you don’t want to use GFS, you can use LocalFS. It’s for passive/active cluster architecture.

Remotely Access

To access simultaneously to remote nodes, you can install one of these packages:

cconsole: cluster console
crlogin: for rlogin
crtelnet: for telnet

Fencing

The fencing is the SCSI reservation in failure case. Each node of your cluster has a quorum of 1. The number of the quorum increases if there is a priority node due to another crash node.

For example, with a SAN, the good node will delete the other SCSI reservation to become the bigger one. As soon as the cluster is repaired, all the nodes will take back the “1” value.

The last switched off node must be the first switched on

For example:

N2 stop
N1 + Quorum +1
– modifications –
N1 stop
N2 boot
N2 boot failure because N1 has deleted SCSI reservation for N2

A calculation is available to know maximum devices:

  n - 1 = max devices

n represents the number of nodes.

Edit this page

Last updated 11 Oct 2007, 06:31 CEST. history

Compiz: Setting Up a 3D Desktop

Scapy: Data Frames and Packets

Introduction and Architecture of Setting Up a SUN Cluster

Introduction link

Types of Clusters link

Definitions link

Solaris Zone link

Cluster application creator link

How to know my release link

Infrastructure link

Network link

CMM link

Global File System link

EFI link

LocalFS or Failover FileSystem link

Remotely Access link

Fencing link