Installing and Configuring Sun Cluster 3.1 09/04 Software for High-Availability ApplicationsAmy Rich, June 2005 Abstract: This article explains how to set up a Sun Cluster 3.1 09/04 environment for highly available services. Contents:
Introduction to the Sun Cluster EnvironmentToday's production environments often require 24x7 availability on critical services such as mail, LDAP, and web and database servers. Often the time it takes to perform one clean reboot is more downtime than is desired for the entire year. The solution to this problem is to design an environment where services can migrate between individual servers so that a reboot of a single machine never causes a service downtime. Sun's solution to architecting these highly available services is combining the software product, Sun Cluster, with the appropriately sized servers, shared storage, and network topology. The goal of the cluster is to maintain near 100 percent uptime while ensuring data integrity. More than one highly available application can be run on a cluster at a time, either sharing resources with other applications, or running on separate nodes with their own resources. The Sun Cluster environment is not only a way of tightly coupling multiple machines and resources, but also a mechanism by which these hardware and software components can be managed. Various types of cluster configurations are possible, and choosing the right configuration for your site requires knowing a bit about Sun Cluster itself as well as your applications. A Sun Cluster 3.1 09/04 environment consists of two to sixteen SPARC processor-based servers running version 8 or 9 of the Solaris Operating System. It can also support x86 servers running Solaris 9 09/04 or later (x86 configurations are currently limited to two nodes). It also requires the Sun Cluster software and some number of applications with software agents and fault monitors. Each node needs one or more public network interfaces as well as two or more private network interfaces (either directly cabled or connected via a private switch) for communication between cluster nodes. Most cluster installations will also include shared storage in the form of directly attached SCSI or fiber arrays or SAN switches. Sun Cluster ConceptsUnderstanding how the Sun Cluster environment works depends on comprehending a few basic concepts, such as application modes, quorum, and SCSI reservation. Learning how your applications interact with the underlying operating system and how the cluster manages hardware will help you determine what configuration might work best for your site. Clustered applications can run in failover or scalable mode depending on their implementation. In failover mode, the application only runs on one node at a time. If the controlling node fails, then the application and any other associated resources are passed to another node which was previously in standby. A scalable application runs on more than one node at the same time without any failover. Not every application can operate in scalable mode, since the application must implement its own write-locking mechanism so that data does not become corrupted on any shared storage devices. Parallel databases can be considered a specific type of scalable application. They run on each node without failover and handle different queries and often parallel queries on the same database. Currently, Sun Cluster 3.1 09/04 only supports Oracle 8i OPS and Oracle RAC (9i and 10g) in such a configuration. During normal operation, each cluster node regularly sends out heartbeat information across the private network to let every other node know about its health. In the event that one or more nodes fail to successfully transmit heartbeat information to other nodes in the cluster, the cluster removes those machines and continues without them. Any applications or resources associated with the removed nodes are failed over if appropriate. The cluster safeguards against data corruption by using the principle of quorum and fencing. Cluster configurations come in four types, clustered pairs, Pair+N, N+1, and Multi-ported N*N scalable. In a clustered pair topology, two or more pairs of nodes are each physically connected to some external storage shared by the pair. This configuration consists of an even number of nodes (N), two to sixteen, and a minimum of N/2 external shared storage devices. A clustered pair with only two nodes is the only topology supported by the Solaris OS for x86 platforms. The following figure shows two pairs with two storage devices each:
A Pair+N topology has one set of nodes directly attached to shared storage and N number of additional nodes which are not connected to any storage. In this case, any access of the shared storage on nodes which are not directly attached happens through one of the directly attached nodes via the private cluster network. The following figure shows a Pair+N configuration, where N is two. When nodes 3 or 4 access the storage, they must do so through nodes 1 or 2.
In an N+1, or star topology, primary and secondary nodes do not need to be configured identically. Some number of primary nodes are all active, and one secondary node, which acts as a failover for the N other primaries, can be either active or passive. The secondary node is the only node which is connected to all shared storage devices. The following figure shows three primary nodes (N), each running their own applications with one secondary node as the failover:
In an N*N topology, all nodes are connected to all shared storage devices in the cluster. The following figure shows four nodes, each connected to two shared storage devices:
When describing the possible cluster topologies, we briefly mentioned
shared external storage, but we didn't discuss how storage in general is seen
within the cluster. The Sun Cluster When the cluster forms, it automatically assigns unique (to the cluster)
IDs to each device within it under the To form a cluster and offer services, the nodes in a cluster must first reach
quorum. The quorum equation states that a cluster must have the total number
of configured votes, divided by two (remainders are discarded), plus one
( The key to understanding quorum is learning how votes are assigned and
counted. Each node in a configured cluster has one ( By doing some simple math, it's easy to see that a two-node cluster must have a quorum device to continue operating if one node fails. Once installed, a two-node cluster under Sun Cluster must have a quorum device for this very reason. Quorum required to operate: Q = TCV/2 + 1 = (2)/2 + 1 = 2 Votes if one node fails: 1 When you introduce a quorum device, the equation changes. This Sun Cluster configuration, shown in the following figure, is one of the most common.
Quorum required to operate: Q = (2 + 1)/2 + 1 = 2 Votes if one node fails: (1 + 1) = 2 Below are some quorum examples in more complex cluster configurations.
Note in the last example that the quorum device is connected to three devices (N) and therefore has two (N-1) votes. The same quorum formula still applies, though. Quorum required to operate: Q = (3 + 2)/2 + 1 = 3 Votes if one node fails: (2 + 2) = 4 Votes if two nodes fail: (1 + 2) = 3 Votes if just the QD fails: ( 1 + 1 + 1) = 3 Votes if any node plus the QD fails: (1 + 1) = 2 As a note of warning, when allocating quorum devices, always use the minimum number possible to achieve quorum, or the health of the cluster will depend on the health of the shared disks configured as quorum devices. In the case where only one of the configured quorum devices is necessary for cluster operation, the cluster will fail unnecessarily if one of the unneeded quorum devices fails. Also, never have the number of quorum device votes exceed the number of device votes, or you run the risk of enabling two separate clusters to form independently (which is known as "split brain"). In this case, both clusters will compete for traffic on the public network, and data between the two will be out of sync. Understanding SCSI Reservations Another mechanism that protects data integrity within the cluster in conjunction with the quorum principle is SCSI reservations. When the cluster forms, one node takes responsibility for any quorum devices by using SCSI reservations. With SCSI 3-capable storage where there are more than two paths to the storage, this reservation is accomplished by each available cluster node registering a key by writing it to the disk. The controlling node is tagged as the owner, and the other nodes are tagged as capable of becoming the owner. If a node fails, the remaining nodes remove the failed node's key from the disk, and it is no longer eligible to own the quorum device. Once the failed node recovers and rejoins the cluster, its key is re-registered. In the event that the controlling node leaves the cluster, the remaining eligible nodes compete to gain control of the quorum devices. If the cluster is not cleanly shut down and nodes go down individually, the last node down must be the first one up, because it is the only node eligible to control the quorum devices. If the controlling node was the only eligible node in the cluster when the cluster rebooted and the controlling node cannot come back up, the administrator must boot one machine outside the cluster and make manual changes to the cluster configuration database so that it may achieve quorum without control of the quorum device. Once the modified machine is able to form the cluster, machines in the cluster will re-register their keys with the quorum device. The reservation keys can be read from the quorum device by using the
SCSI 2: pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2 SCSI 3: scsi -c inkeys -d /dev/did/rdsk/d4s2 The quorum device owner can be determined using the following commands: SCSI 2: pgre -c pgre_resv -d /dev/did/rdsk/d4s2 SCSI 3: scsi -c inresv -d /dev/did/rdsk/d4s2 Pre-installation Planning and TasksWith an understanding of how Sun Cluster works behind the scenes, you should have some idea about the kind of cluster configuration you'd like to implement. Before installing and configuring the cluster, though, there are some things to consider first. Configuring Solaris Volume Manager Software To control shared disks, Sun Cluster supports both VERITAS Volume Manager (VxVM) or Sun's volume manager, called Solaris DiskSuite under the Solaris 8 OS and Solaris Volume Manager under the Solaris 9 OS. SDS/SVM is cluster aware, and Solaris 9 09/04 SVM supports parallel writes for Oracle RAC, so you can go with these products unless you need other features that VxVM offers. This article discusses using the Solaris 9 OS and SVM since several other documents already cover configurations with VxVM. General practice for most production Solaris 9 OS installations is to mirror the boot disk using SVM. Instructions on how to use SVM can be found in the Solaris Volume Manager Administration Guide. When planning out an SVM installation for a cluster, there are a few more things to consider beyond keeping a slice for the metadbs. For detailed information, read the Planning Volume Management section of the Sun Cluster Software Installation Guide for Solaris OS. The first consideration beyond normal SVM configuration is the
Secondly, all cluster nodes require identical
Each shared storage volume must exist as part of a metaset. Starting with Solaris 9 09/04, multi-owner disk sets are also supported, meaning that multiple nodes can own and write to shared disks. This is the simultaneous write functionality required by Oracle OPS/RAC. For more information, read the Solaris Volume Manager for Sun Cluster section of the Solaris Volume Manager Administration Guide. Before installing and configuring the cluster environment, you may need to install patches or upgrade firmware on your hardware. Determine your requirements by using the PatchPro tool from http://patchpro.sun.com/. Once at this site, click the Sun Cluster link and describe your cluster environment. There are four buttons below the description area which will generate patch lists for Solaris (pre-installation), Sun Cluster, Post-Install, and Data Services (if you selected additional data services in your cluster description). Any patches listed in the Solaris pre-install group must be installed to the listed minimum revision before installing the Sun Cluster software. Once you have a running cluster, Sun recommends installing patches on the cluster in a very specific way. For a complete understanding of how patches and firmware upgrades should be applied to cluster systems, read Chapter 8: Patching Sun Cluster Software and Firmware in the Sun Cluster System Administration Guide for Solaris OS. Configuring Network Interfaces Each cluster node must have at least one public network adapter and two
private cluster network adapters for redundant heartbeat transmission. It's
also suggested that each node have more than one public network adapter for
maximum redundancy in the event of a network adapter hardware failure.
Because each network interface must have its own Ethernet address, make sure
to set The public network adapters are configured under IP Multipathing (IPMP) for automatic failover and load spreading. In the case where an IPMP group contains only one adapter (for example, only one interface on the public network), the adapter requires only one IP. In the case where the IPMP group has multiple network adapters (for example, the private network or multi-homed public networks), each adapter requires one primary IP plus a test IP. These test IP addresses cannot be used by normal applications because they
are not highly available, so they are marked with the For example, say that you have a node named node1 group nafo0 up addif node1-hme0-test group nafo0 netmask + broadcast + -failover deprecated up Now create the test interface for node1-qfe1-test group nafo0 netmask + broadcast + -failover deprecated up For additional information on configuring IPMP, read the IP Network Multipathing Administration Guide. Also review the "IP Network Multipathing Groups" section of Sun Cluster Configurable Components under the Sun Cluster 3.1 09/04 Software Collection for Solaris OS. If a cluster has only two nodes, the heartbeat interfaces can be cross connected between the two machines. If the cluster has more than two nodes, two switches (or, less optimally, two VLANs on one switch) must be used to connect the nodes. When installing the cluster framework, it will ask about cluster transport junctions if there are more than two nodes in the cluster or if you choose the Custom configuration. Even if there are only two nodes, it's wise to configure cluster transport junction names in case you move to using switches later. Installing and Configuring Sun Cluster SoftwareNow that we've covered the basic important concepts behind Sun Cluster and you have a machine that's patched and connected to both the public and private cluster networks, you're ready to install the cluster software itself. First, download the cluster software and agent software from the Sun Cluster page or from the Cluster software CD-ROM and unzip the archive files. Creating an Administrative Console If you're going to use a machine outside the cluster as an administrative
console, add the To configure the cluster console tools, create an
Installing Sun Cluster Framework Software On each node in the cluster, install the Sun Web Console packages. This is
accomplished by changing directory to
Now run
Once the first node finishes, run this on any additional nodes to have them join the cluster: Solaris_sparc/Product/sun_cluster/Solaris_9/Tools/scinstall
From the Install Menu, choose the menu item You can also opt to install all nodes at once if you've enabled root rsh or
ssh access from the first node where you are running To enable you to run the newly installed binaries and read the associated
man pages without using full paths, modify your When the first cluster node is installed, the cluster enables
The The cluster also needs to keep time closely synced. The cluster
installation adds an ntp configuration file called
You're now ready to install the Data Services packages, those programs
which monitor various applications and handle stopping, starting, and
migrating them on the cluster. Make sure that you have downloaded the
How you configure application services will depend on the applications you choose, but they all involve configuring some sort of resource group. Resource groups usually contain a logical host resource (virtual IP address for the resource), a data storage resource, and one or more application resources. For example, a failover web server resource would contain the virtual IP assigned to the web site, the global file system used by the web server, and an application that starts, stops, and monitors the web server. For information on configuring each type of pre-defined Data Service, please read the individual Sun Cluster Data Service guides. Once your resource groups are set up and online, your cluster is ready for use. Important Man PagesThe Sun Cluster manual pages are available as part of the Sun Cluster Reference Manual for Solaris OS. Of particular interest are the following administrative man pages:
Resources
Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License. |
|