Solaris Cluster (sometimes Sun Cluster or SunCluster) is a high-availability cluster software product for the Solaris Operating System, created by Sun Microsystems.
It is used to improve the availability of software services such as databases, file sharing on a network, electronic commerce websites, or other applications. Sun Cluster operates by having redundant computers or nodes where one or more computers continue to provide service if another fails. Nodes may be located in the same data center or on different continents.
This Documentation has been released with:
Sun Solaris update 7
Sun Cluster 3.2u2
Requirements
All of the following items are required before installing Sun Cluster. Follow all these steps before installation.
Hardware
To make a real cluster, here is the required hardware list:
2 nodes
sun-node1
sun-node2
4 network cards
2 for Public interface (with IPMP on it)
2 for Private interface (for cluster: heartbeat & nodes information exchange)
1 disk array with 1 spare disk
Partitioning
While you install Solaris, you should make a slice called /globaldevices containing at least 512MB. This slice should be in UFS (ZFS does not work as global device at the moment).
If you didn't create this slice during Solaris installation, you can:
Use the format command to create a new slice
Use newfs command to format filesystem as UFS
Mount this filesystem with global option in /globaldevices
Do not forget to apply the same /etc/hosts file to all cluster nodes!!! And when you make changes, change it on every node!
Patches
Use Sun Update Manager if you have a graphical interface to update all the available packages. If you don't have graphical interfaces, please install all available patches to avoid installation problems.
IPMP Configuration
You need to configure at least 2 interfaces for your public network. Follow this documentation: IPMP Configuration
You don't have to do it for your private network because it will be automatically done by the cluster during installation.
Activate all network cards
With your 4 network cards, you should activate all your cards to be easily recognized during installation. First, run ifconfig -a to check if all your cards are plumbed. If not, enable them:
If you have installed the latest Solaris version, you may encounter Node integration problems due to RPC binding. This is a new SUN security feature. As we need to allow communication between nodes, we need to disable binding on RPC protocol (and could do it for the webconsole as well). You should do this operation on each node.
Ensure that the local_only property of rpcbind is set to false:
*** Main Menu ***
Please select from one of the following (*) options:
* 1) Create a new cluster or add a cluster node
2) Configure a cluster to be JumpStarted from this install server
3) Manage a dual-partition upgrade
4) Upgrade this cluster node
5) Print release information for this cluster node
* ?) Help with menu options
* q) Quit
Option:
*** New Cluster and Cluster Node Menu ***
Please select from any one of the following options:
1) Create a new cluster
2) Create just the first node of a new cluster on this machine
3) Add this machine as a node in an existing cluster
?) Help with menu options
q) Return to the Main Menu
Option:
*** Create a New Cluster ***
This option creates and configures a new cluster.
You must use the Java Enterprise System (JES) installer to install the
Sun Cluster framework software on each machine in the new cluster
before you select this option.
If the "remote configuration" option is unselected from the JES
installer when you install the Sun Cluster framework on any of the new
nodes, then you must configure either the remote shell (see rsh(1)) or
the secure shell (see ssh(1)) before you select this option. If rsh or
ssh is used, you must enable root access to all of the new member
nodes from this node.
Press Control-d at any time to return to the Main Menu.
Do you want to continue (yes/no)
>>> Typical or Custom Mode <<<
This tool supports two modes of operation, Typical mode and Custom.
For most clusters, you can use Typical mode. However, you might need
to select the Custom mode option if not all of the Typical defaults
can be applied to your cluster.
For more information about the differences between Typical and Custom
modes, select the Help option from the menu.
Please select from one of the following options:
1) Typical
2) Custom
?) Help
q) Return to the Main Menu
Option [1]:
>>> Cluster Name <<<
Each cluster has a name assigned to it. The name can be made up of any
characters other than whitespace. Each cluster name should be unique
within the namespace of your enterprise.
What is the name of the cluster you want to establish ?
>>> Cluster Nodes <<<
This Sun Cluster release supports a total of up to 16 nodes.
Please list the names of the other nodes planned for the initial
cluster configuration. List one node name per line. When finished,
type Control-D:
Node name: sun-node1
Node name: sun-node2
Node name (Control-D to finish): ^D
Attempting to contact "sun-node2" ... done
Searching for a remote configuration method ... done
The secure shell (see ssh(1)) will be used for remote execution.
Press Enter to continue:
>>> Authenticating Requests to Add Nodes <<<
Once the first node establishes itself as a single node cluster, other
nodes attempting to add themselves to the cluster configuration must
be found on the list of nodes you just provided. You can modify this
list by using claccess(1CL) or other tools once the cluster has been
established.
By default, nodes are not securely authenticated as they attempt to
add themselves to the cluster configuration. This is generally
considered adequate, since nodes which are not physically connected to
the private cluster interconnect will never be able to actually join
the cluster. However, DES authentication is available. If DES
authentication is selected, you must configure all necessary
encryption keys before any node will be allowed to join the cluster
(see keyserv(1M), publickey(4)).
Do you need to use DES authentication (yes/no) [no]?
>>> Network Address for the Cluster Transport <<<
The cluster transport uses a default network address of 172.16.0.0. If
this IP address is already in use elsewhere within your enterprise,
specify another address from the range of recommended private
addresses (see RFC 1918 for details).
The default netmask is 255.255.248.0. You can select another netmask,
as long as it minimally masks all bits that are given in the network
address.
The default private netmask and network address result in an IP
address range that supports a cluster with a maximum of 64 nodes and
10 private networks.
Is it okay to accept the default network address (yes/no) [yes]?
>>> Minimum Number of Private Networks <<<
Each cluster is typically configured with at least two private
networks. Configuring a cluster with just one private interconnect
provides less availability and will require the cluster to spend more
time in automatic recovery if that private interconnect fails.
Should this cluster use at least two private networks (yes/no) [yes]?
>>> Point-to-Point Cables <<<
The two nodes of a two-node cluster may use a directly-connected
interconnect. That is, no cluster switches are configured. However,
when there are greater than two nodes, this interactive form of
scinstall assumes that there will be exactly one switch for each
private network.
Does this two-node cluster use switches (yes/no) [yes]?
>>> Cluster Transport Adapters and Cables <<<
You must configure the cluster transport adapters for each node in the
cluster. These are the adapters which attach to the private cluster
interconnect.
Select the first cluster transport adapter for "sun-node1":
1) e1000g1
2) e1000g2
3) e1000g3
4) Other
Option:
Adapter "e1000g3" is an Ethernet adapter.
Searching for any unexpected network traffic on "e1000g3" ... done
Verification completed. No traffic was detected over a 10 second
sample period.
The "dlpi" transport type will be set for this cluster.
Name of adapter on "sun-node2" to which "e1000g3" is connected? e1000g3
Select the second cluster transport adapter for "sun-node1":
1) e1000g1
2) e1000g2
3) e1000g3
4) Other
Option:
Adapter "e1000g2" is an Ethernet adapter.
Searching for any unexpected network traffic on "e1000g2" ... done
Verification completed. No traffic was detected over a 10 second
sample period.
Name of adapter on "sun-node2" to which "e1000g2" is connected?
>>> Quorum Configuration <<<
Every two-node cluster requires at least one quorum device. By
default, scinstall will select and configure a shared SCSI quorum disk
device for you.
This screen allows you to disable the automatic selection and
configuration of a quorum device.
The only time that you must disable this feature is when ANY of the
shared storage in your cluster is not qualified for use as a Sun
Cluster quorum device. If your storage was purchased with your
cluster, it is qualified. Otherwise, check with your storage vendor to
determine whether your storage device is supported as Sun Cluster
quorum device.
If you disable automatic quorum device selection now, or if you intend
to use a quorum device that is not a shared SCSI disk, you must
instead use clsetup(1M) to manually configure quorum once both nodes
have joined the cluster for the first time.
Do you want to disable automatic quorum device selection (yes/no) [no]?
>>> Global Devices File System <<<
Each node in the cluster must have a local file system mounted on
/global/.devices/node@<nodeID> before it can successfully participate
as a cluster member. Since the "nodeID" is not assigned until
scinstall is run, scinstall will set this up for you.
You must supply the name of either an already-mounted file system or
raw disk partition which scinstall can use to create the global
devices file system. This file system or partition should be at least
512 MB in size.
If an already-mounted file system is used, the file system must be
empty. If a raw disk partition is used, a new file system will be
created for you.
The default is to use /globaldevices.
Is it okay to use this default (yes/no) [yes]?
>>> Automatic Reboot <<<
Once scinstall has successfully initialized the Sun Cluster software
for this machine, the machine must be rebooted. After the reboot, this
machine will be established as the first node in the new cluster.
Do you want scinstall to reboot for you (yes/no) [yes]?
During the cluster creation process, sccheck is run on each of the new
cluster nodes. If sccheck detects problems, you can either interrupt
the process or check the log files after the cluster has been
established.
Interrupt cluster creation for sccheck errors (yes/no) [no]?
The Sun Cluster software is installed on "sun-node2".
Started sccheck on "sun-node1".
Started sccheck on "sun-node2".
sccheck completed with no errors or warnings for "sun-node1".
sccheck completed with no errors or warnings for "sun-node2".
Configuring "sun-node2" ... done
Rebooting "sun-node2" ...
Waiting for "sun-node2" to become a cluster member ...
Manual configuration
Here is an example of setting up the first node and allowing another node:
If you've made the installation with a quorum, you'll need to set it up with the webremote or with these commands. First, you need to list all LUNs with DID format:
If at the end of the installation you encounter this kind of problem (a message like "The cluster is in installation mode" or "Le cluster est en mode installation") this means you need to configure something before configuring your RG or RS.
How to change Private Interconnect IP for cluster?
The cluster install wanted to use a .0.0 as the private interconnect, and when installed, one private interconnect ended up on 172.16.0 and one ended up on 172.16.1, causing one private interconnect to fault. I found an article that indicated you could edit the cluster configuration by first booting each machine in non-cluster mode (boot-x, I actually did a reboot and then a stop A on the reboot and then a boot -x).
Edit the file /etc/cluster/ccr/infrastructure and then incorporate your changes using:
After I modified the file to change both private interconnects to be on the 172.16.0 subnet, the second private interconnect came online. Once the second private interconnect came up, I was able to run scsetup, select an additional quorum drive and then set the cluster out of install mode.
Some commands cannot be executed on a cluster in Install mode
This is generally the case in a 2-node cluster when Quorum is not already set. As described in the man page:
Specify the installation-mode setting for the cluster. You can
specify either enabled or disabled for the installmode property.
While the installmode property is enabled, nodes do not attempt to
reset their quorum configurations at boot time. Also, while in this
mode, many administrative functions are blocked. When you first
install a cluster, the installmode property is enabled.
After all nodes have joined the cluster for the first time, and
shared quorum devices have been added to the configuration, you must
explicitly disable the installmode property. When you disable the
installmode property, the quorum vote counts are set to default
values. If quorum is automatically configured during cluster
creation, the installmode property is disabled as well after quorum
has been configured.
However, if you don't want to add a quorum or would like to use it now, simply run this command:
DID number 3 corresponds to and is reserved for the disk array management and may be seen by the cluster. As it cannot be written (because disk arrays show it in read-only) by the cluster, it shows errors. However, these are not actual errors and you can carefully use your cluster.
Method 1
To recover your DID as cleanly as possible, run this command on all the cluster nodes:
ATTENTION: If you create a new path_to_inst at boottime with 'boot -ra' you should be on the physical boot device. It may not be possible to write a path_to_inst on a boot mirror (SVM or VxVM).
Edit configuration files
edit /etc/vfstab to remove did and global entries
edit /etc/nsswitch.conf to remove cluster references
Reboot the node with -a option (necessary to write a new path_to_inst file)
I had a problem switching an RG on a Solaris 10u7 with Sun Cluster 3.2u2 (installed patches: 126107-33, 137104-02, 142293-01, 141445-09). The ZFS volume wouldn't mount on another node. In the /var/adm/messages file, I saw this message when trying to mount an RG:
Dec 16 15:34:30 LD-TLH-SRV-PRD-3 zfs: [ID 427000 kern.warning] WARNING: pool 'ulprod-ld_mysql' could not be loaded as it was last accessed by another system (host: LD-TLH-SRV-PRD-2 hostid: 0x27812152). See: http://www.sun.com/msg/ZFS-8000-EY
In fact, it's a bug that can be bypassed by putting the RG offline:
Aug 17 22:23:28 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 148650 daemon.notice] Started searching for devices in '/dev/dsk' to find the importable pools.
Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 547433 daemon.notice] Completed searching the devices in '/dev/dsk' to find the importable pools.
Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 471757 daemon.error] cannot import pool 'qnap': '/var/cluster/run/HAStoragePlus/zfs' is not a valid directory
Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 117328 daemon.error] The pool 'qnap' failed to import and populate cachefile.
Aug 17 22:23:35 minipardus SC[,SUNW.HAStoragePlus:8,clstorage,zfspool,hastorageplus_prenet_start]: [ID 292307 daemon.error] Failed to import:qnap
If that's the case, it's apparently fixed in Sun Cluster 3.2u3.
To avoid installing this update, create this folder '/var/cluster/run/HAStoragePlus/zfs':
Cluster is unavailable when a node crashes on a 2-node cluster
Two types of problems can arise from cluster partitions: split brain and amnesia. Split brain occurs when the cluster interconnect between Solaris hosts is lost and the cluster becomes partitioned into subclusters, and each subcluster believes that it is the only partition. A subcluster that is not aware of the other subclusters could cause a conflict in shared resources, such as duplicate network addresses and data corruption.
Amnesia occurs if all the nodes leave the cluster in staggered groups. An example is a two-node cluster with nodes A and B. If node A goes down, the configuration data in the CCR is updated on node B only, and not node A. If node B goes down at a later time, and if node A is rebooted, node A will be running with old contents of the CCR. This state is called amnesia and might lead to running a cluster with stale configuration information.
You can avoid split brain and amnesia by giving each node one vote and mandating a majority of votes for an operational cluster. A partition with the majority of votes has a quorum and is enabled to operate. This majority vote mechanism works well if more than two nodes are in the cluster. In a two-node cluster, a majority is two. If such a cluster becomes partitioned, an external vote enables a partition to gain quorum. This external vote is provided by a quorum device. A quorum device can be any disk that is shared between the two nodes.
Recovering from amnesia
Scenario: Two node cluster (nodes A and B) with one Quorum Device, nodeA has gone bad, and amnesia protection is preventing nodeB from booting up.
Amnesia occurs if all the nodes leave the cluster in staggered groups. An example is a two-node cluster with nodes A and B. If node A goes down, the configuration data in the CCR is updated on node B only, and not node A. If node B goes down at a later time, and if node A is rebooted, node A will be running with old contents of the CCR. This state is called amnesia and might lead to running a cluster with stale configuration information.