Topics: GPFS, Oracle, PowerHA / HACMP
Oracle RAC introduction
The traditional method for making an Oracle database capable of 7*24 operation is by means of creating an HACMP cluster in an Active-Standby configuration. In case of a failure of the Active system, HACMP lets the standby system take over the resources, start Oracle and thus resumes operation. This takeover is done with a downtime period of aprox. 5 to 15 minutes, however the impact on the business applications is more severe. It can lead to interruptions up to one hour in duration.
Another way to achieve high availability of databases, is to use a special version of the Oracle database software called Real Application Cluster, also called RAC. In a RAC cluster multiple systems (instances) are active (sharing the workload) and provide a near always-on database operation. The Oracle RAC software relies on IBM's HACMP software to achieve high availability for hardware and the operating system platform AIX. For storage it utilizes a concurrent filesystem called GPFS (General Parallel File System), a product of IBM. Oracle RAC 9 uses GPFS and HACMP. With RAC 10 you no longer need HACMP and
GPFS.
HACMP is used for network down notifications. Put all network adapters of 1 node on a single switch and put every node on a different switch. HACMP only manages the public and private network service adapters. There are no standby, boot or management adapters in a RAC HACMP cluster. It just uses a single
hostname; Oracle RAC and GPFS do not support hostname take-over or IPAT (IP Address take-over). There are no disks, volume groups or resource groups defined in an HACMP RAC cluster. In fact, HACMP is only necessary for event handling for Oracle RAC.
Name your HACMP RAC clusters in such away, that you can easily recognize the cluster as a RAC cluster, by using a naming convention that starts with RAC_.
On every GPFS node of an Oracle RAC cluster a GPFS daemon (mmfs) is active. These daemons need to communicate with each other. This is done via the public network, not via the private network.
Cache Fusion
Via SQL*Net an Oracle block is read in memory. If a second node in an HACMP RAC cluster requests the same block, it will first check if it already has it stored locally in its own cache. If not, it will use a private dedicated network to ask if another node has the block in cache. If not, the block will be read from disk. This is called Cache Fusion or Oracle RAC interconnect.
This is why on RAC HACMP clusters, each node uses an extra private network adapter to communicate with the other nodes, for Cache Fusion purposes only. All other communication, including the communication between the GPFS daemons on every node and the communication from Oracle clients, is done via the public network adapter. The throughput on the private network adapter can be twice as high as on the public network adapter.
Oracle RAC will use its own private network for Cache Fusion. If this network is not available, or if one node is unable to access the private network, then the private network is no longer used, but the public network will be used instead. If the private network returns to normal operation, then a fallback to the private network will occur. Oracle RAC uses cllsif of HACMP for this purpose.
If you found this useful, here's more on the same topic(s) in our blog:
- Adding a new volume group to an active PowerHA resource group
- NTP slewing in clusters
- Where's the config of GPFS stored?
- NFS mounts on HACMP failing
- GPFS & FSCK
Interested in learning more?