Topics: AIX, PowerHA / HACMP, System Admin


IBM has implemented a new feature implemented for JFS2 filesystems to prevent simultaneous mounting within PowerHA clusters.

While PowerHA can give concurrent access of volume groups to multiple systems, mounting a JFS2 filesystem on multiple nodes simultaneously will cause filesystem corruption. These simultaneous mount events can also cause a system crash, when the system detects a conflict between data or metadata in the filesystem and the in-memory state of the filesystem. The only exception to this is mounting the filesystem read-only, where files or directories can't be changed.

In AIX 7100-01 and 6100-07 a new feature called "Mount Guard" has been added to prevent simultaneous or concurrent mounts. If a filesystem appears to be mounted on another server, and the feature is enabled, AIX will prevent mounting on any other server. Mount Guard is not enabled by default, but is configurable by the system administrator. The option is not allowed to be set on base OS filesystems such as /, /usr, /var etc.

To turn on Mount Guard on a filesystem you can permanently enable it via /usr/sbin/chfs:

# chfs -a mountguard=yes /mountpoint
/mountpoint is now guarded against concurrent mounts.
The same option is used with crfs when creating a filesystem.

To turn off mount guard:
# chfs -a mountguard=no /mountpoint
/mountpoint is no longer guarded against concurrent mounts.
To determine the mount guard state of a filesystem:
# lsfs -q /mountpoint
Name      Nodename Mount Pt    VFS  Size    Options Auto Accounting
/dev/fslv --       /mountpoint jfs2 4194304 rw      no   no
  (lv size: 4194304, fs size: 4194304, block size: 4096, sparse files: yes,
  inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: 
  no, VIX: yes, EFS: no, ISNAPSHOT: no, MAXEXT: 0, MountGuard: yes)
The /usr/sbin/mount command will not show the mount guard state.

When a filesystem is protected against concurrent mounting, and a second mount attempt is made you will see this error:
# mount /mountpoint
mount: /dev/fslv on /mountpoint:
Cannot mount guarded filesystem.
The filesystem is potentially mounted on another node
After a system crash the filesystem may still have mount flags enabled and refuse to be mounted. In this case the guard state can be temporarily overridden by the "noguard" option to the mount command:
# mount -o noguard /mountpoint
mount: /dev/fslv on /mountpoint:
Mount guard override for filesystem.
The filesystem is potentially mounted on another node.

Topics: PowerHA / HACMP


The standard tool for cluster monitoring is clstat, which comes along with PowerHA SystemMirror/HACMP. Clstat is rather slow with its updates, and sometimes the required clinfo deamon needs restarting in order to get it operational, so this is, well, not perfect. There's a script which is also easy to use. It is written by PowerHA/HACMP guru Alex Abderrazag. This script shows you the correct PowerHA/HACMP status, along with adapter and volume group information. It works fine on HACMP 5.2 through 7.2. You can download it here: qha. This is version 9.06. For the latest version, check

This tiny but effective tool accepts the following flags:

  •   -n (show network interface info)
  •   -N (show interface info and active HBOD)
  •   -v (show shared online volume group info)
  •   -l (log to /tmp/qha.out)
  •   -e (show running events if cluster is unstable)
  •   -m (show status of monitor app servers if present)
  •   -1 (exit after first iteration)
  •   -c (CAA SAN / Disk Comms)
For example, run:
# qha -nev
It's useful to put "qha" in /usr/es/sbin/cluster/utilities, as that path is usually already defined in $PATH, and thus you can run qha from anywhere.

A description of the possible cluster states:
  • ST_INIT: cluster configured and down
  • ST_JOINING: node joining the cluster
  • ST_VOTING: Inter-node decision state for an event
  • ST_RP_RUNNING: cluster running recovery program
  • ST_BARRIER: clstrmgr waiting at the barrier statement
  • ST_CBARRIER: clstrmgr is exiting recovery program
  • ST_UNSTABLE: cluster unstable
  • NOT_CONFIGURED: HA installed but not configured
  • RP_FAILED: event script failed
  • ST_STABLE: cluster services are running with managed resources (stable cluster) or cluster services have been "forced" down with resource groups potentially in the UNMANAGED state (HACMP 5.4 only)

Topics: Monitoring, PowerHA / HACMP

Cluster status webpage

How do you monitor multiple HACMP/PowerHA clusters? You're probably familiar with the clstat or the xclstat commands. These are nice, but not sufficient when you have more than 8 HACMP/PowerHA clusters to monitor, as it can't be configured to monitor more than 8 clusters. It's also difficult to get an overview of ALL clusters in a SINGLE look with clstat. IBM included a clstat.cgi in HACMP 5 to show the cluster status on a webpage. This still doesn't provide an overview in a single look, as the clstat.cgi shows a long listing of all clusters, and it is just like clstat limited to monitoring only 8 clusters.

The HACMP/PowerHA cluster status can be retrieved via SNMP (this is actually what clstat does too). Using the IP addresses of a cluster and the snmpinfo command, you can remotely retrieve cluster status information, and use that information to build a webpage. We've written a script for this purpose. By using colors for the status of the clusters and the nodes (green = ok, yellow = something is happening, red = error), you can get a quick overview of the status of all the HACMP/PowerHA clusters.

Per cluster you can see: the cluster name, the cluster ID, HACMP version and the status of the cluster and all its nodes. It will also show you where any resource groups are active.

You can download the script here. This is version 1.6. Untar the file that you download. There is a README in the package, that will tell you how you can configure the script. This script has been tested with HACMP version 4, 5, 6, and up to PowerHA version

Topics: PowerHA / HACMP

PowerHA / HACMP support matrix

Support matrix / life cycle for IBM PowerHA (with a typical 3 year lifecycle):

End Of
HACMP 5.1YESYESYESNONONOJuly 11, 2003Sep 1, 2006
HACMP 5.2YESYESYESNONONOJuly 16, 2004Sep 30, 2007
HACMP 5.3NOML4+ML2+YESNONOAug 12, 2005Sep 30, 2009
HACMP 5.4.0NOTL8+TL4+NONONOJuly 28, 2006Sep 30, 2011
HACMP 5.4.1NOTL8+TL4+YESYESNOSep 11, 2007Sep 30, 2011
PowerHA 5.5NONOTL7+TL2 SP1+YESNONov 14, 2008Apr 30, 2012
PowerHA 6.1NONOTL9+TL2 SP1+YESNOOct 20, 2009Apr 30, 2015
PowerHA 7.1NONONOTL6+YESNOSep 10, 2010N/A
PowerHA 7.1.1NONONOTL7 SP2+TL1 SP2+NOSep 10, 2010N/A
PowerHA 7.1.2NONONOTL8 SP1+TL2 SP1+NOOct 3, 2012N/A
PowerHA 7.1.3NONONOTL9 SP1+TL3 SP1+NOOct 7, 2013N/A
PowerHA 7.2.0NONONOTL9 SP5+TL3 SP5+
TL4 SP1+
TL0 SP1+Dec 4, 2015N/A
PowerHA 7.2.1NONONOtbdtbdtbdannounced Dec, 2016N/A

Note: None of these versions is supported for AIX 4.3.3.
Source: HACMP Version Compatibility Matrix

Topics: AIX, PowerHA / HACMP, System Admin

clstat: Failed retrieving cluster information

If clstat is not working, you may get the following error, when running clstat:

# clstat
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

Refer to the HACMP Administration Guide for more information.
Additional information for verifying the SNMP configuration on AIX 6
can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
To resolve this, first of all, go ahead and read the README that is referred to. You'll find that you have to enable an entry in /etc/snmdv3.conf:
Commands clstat or cldump will not start if the internet MIB tree is not enabled in snmpdv3.conf file. This behavior is usually seen in AIX 6.1 onwards where this internet MIB entry was intentionally disabled as a security issue. This internet MIB entry is required to view/resolve risc6000clsmuxpd ( MIB sub tree which is used by clstat or cldump functionality.

There are two ways to enable this MIB sub tree (risc6000clsmuxpd). They are:

1) Enable the main internet MIB entry by adding this line in /etc/snmpdv3.conf file:

VACM_VIEW defaultView internet - included -

But doing so is not recommended, as it unlocks the entire MIB tree.

2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling the main MIB tree by adding this line in /etc/snmpdv3.conf file.

VACM_VIEW defaultView - included -

Note: After enabling the MIB entry above snmp daemon must be restarted with the following commands as shown below:

# stopsrc -s snmpd
# startsrc -s snmpd

After snmp is restarted leave the daemon running for about two minutes before attempting to start clstat or cldump.
Sometimes, even after doing this, clstat or cldump still don't work. Make sure that a COMMUNITY entry is present in /etc/snmpdv3.conf:
COMMUNITY public plubic noAuthNoPriv -
The next thing may sound silly, but edit the /etc/snmpdv3.conf file, and take out the coments. Change this:
smux gated_password  # gated
smux clsmuxpd_password # HACMP/ES for AIX ...
smux gated_password
smux clsmuxpd_password
Then, recycle the deamons on all cluster nodes. This can be done while the cluster is up and running:
# stopsrc -s hostmibd
# stopsrc -s snmpmibd
# stopsrc -s aixmibd
# stopsrc -s snmpd
# sleep 4
# chssys -s hostmibd -a "-c public"
# chssys -s aixmibd  -a "-c public"
# chssys -s snmpmibd  -a "-c public"
# sleep 4
# startsrc -s snmpd
# startsrc -s aixmibd
# startsrc -s snmpmibd
# startsrc -s hostmibd
# sleep 120
# stopsrc -s clinfoES
# startsrc -s clinfoES
# sleep 120
Now, to verify that it works, run either clstat or cldump, or the following command:
# snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs cluster
Still not working at this point? Then run an Extended Verification and Synchronization:
# smitty
After that, clstat, cldump and snmpinfo should work.

Topics: AIX, PowerHA / HACMP, System Admin

Error in HACMP in LVM

If you run into the following error:

cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group.
This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:
# varyoffvg vg
And then retry the LVM command. If it continues to be a problem, then stop HACMP on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.

Topics: AIX, PowerHA / HACMP, System Admin

NTP slewing in clusters

In order to keep the system time synchronized with other nodes in an HACMP cluster or across the enterprise, Network Time Protocol (NTP) should be implemented. In its default configuration, NTP will periodically update the system time to match a reference clock by resetting the system time on the node. If the time on the reference clock is behind the time of the system clock, the system clock will be set backwards causing the same time period to be passed twice. This can cause internal timers in HACMP and Oracle databases to wait longer periods of time under some circumstances. When these circumstances arise, HACMP may stop the node or the Oracle instance may shut itself down.

Oracle will log an ORA-29740 error when it shuts down the instance due to inconsistent timers. The hatsd daemon utilized by HACMP will log a TS_THREAD_STUCK_ER error in the system error log just before HACMP stops a node due to an expired timer.

To avoid this issue, system managers should configure the NTP daemon to increment time on the node slower until the system clock and the reference clock are in sync (this is called "slewing" the clock) instead of resetting the time in one large increment. The behavior is configured with the -x flag for the xntpd daemon.

To check the current running configuration of xntpd for the -x flag:

# ps -aef | grep xntpd | grep -v grep
    root  409632  188534   0 11:46:45      -  0:00 /usr/sbin/xntpd
To update the current running configuration of xntpd to include the -x flag:
# chssys -s xntpd -a "-x"
0513-077 Subsystem has been changed.
# stopsrc -s xntpd
0513-044 The /usr/sbin/xntpd Subsystem was requested to stop.
# startsrc -s xntpd
0513-059 The xntpd Subsystem has been started. Subsystem PID is 40932.
# ps -f | grep xntpd | grep -grep
    root  409632  188534   0 11:46:45      -  0:00 /usr/sbin/xntpd -x

Topics: AIX, PowerHA / HACMP

AIX 5.3 end-of-service

The EOM date (end of marketing) has been announced for AIX 5.3: 04/11; meaning that AIX 5.3 will no longer be marketed by IBM from April 2011, and that it is now time for customers to start thinking about upgrading to AIX 6.1. The EOS (end of service) date for AIX 5.3 is 04/12, meaning AIX 5.3 will be serviced by IBM until April 2012. After that, IBM will only service AIX 5.3 for an additional fee. The EOL (end of life) date is 04/16, which is the end of life date at April 2016. The final technology level for AIX 5.3 is technology level 12. Some service packs for TL12 will be released though.

IBM has also announced EOM and EOS dates for HACMP 5.4 and PowerHA 5.5, so if you're using any of these versions, you also need to upgrade to PowerHA 6.1:

  • Sep 30, 2010: EOM HACMP 5.4, PowerHA 5.5
  • Sep 30, 2011: EOS HACMP 5.4
  • Sep 30, 2012: EOS HACMP 5.5

Topics: AIX, EMC, Installation, PowerHA / HACMP, SAN, System Admin

Quick setup guide for HACMP

Use this procedure to quickly configure an HACMP cluster, consisting of 2 nodes and disk heartbeating.


Make sure you have the following in place:

  • Have the IP addresses and host names of both nodes, and for a service IP label. Add these into the /etc/hosts files on both nodes of the new HACMP cluster.
  • Make sure you have the HACMP software installed on both nodes. Just install all the filesets of the HACMP CD-ROM, and you should be good.
  • Make sure you have this entry in /etc/inittab (as one of the last entries):
    clinit:a:wait:/bin/touch /usr/es/sbin/cluster/.telinit
  • In case you're using EMC SAN storage, make sure you configure you're disks correctly as hdiskpower devices. Or, if you're using a mksysb image, you may want to follow this procedure EMC ODM cleanup.
  • Create the cluster and its nodes:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure an HACMP Cluster and Nodes
    Enter a cluster name and select the nodes you're going to use. It is vital here to have the hostnames and IP address correctly entered in the /etc/hosts file of both nodes.
  • Create an IP service label:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure Resources to Make Highly Available
    Configure Service IP Labels/Addresses
    Add a Service IP Label/Address
    Enter an IP Label/Address (press F4 to select one), and enter a Network name (again, press F4 to select one).
  • Set up a resource group:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure HACMP Resource Groups
    Add a Resource Group
    Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never Fallback". This way, when the primary node in the cluster comes up, and the resource group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node.

    Note: The order of the nodes is determined by the order you select the nodes here. If you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority.
  • Add the Servie IP/Label to the resource group:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure HACMP Resource Groups
    Change/Show Resources for a Resource Group (standard)
    Select the resource group you've created earlier, and add the Service IP/Label.
  • Run a verification/synchronization:
    # smitty hacmp
    Extended Configuration
    Extended Verification and Synchronization
    Just hit [ENTER] here. Resolve any issues that may come up from this synchronization attempt. Repeat this process until the verification/synchronization process returns "Ok". It's a good idea here to select to "Automatically correct errors".
  • Start the HACMP cluster:
    # smitty hacmp
    System Management (C-SPOC)
    Manage HACMP Services
    Start Cluster Services
    Select both nodes to start. Make sure to also start the Cluster Information Daemon.
  • Check the status of the cluster:
    # clstat -o
    # cldump
    Wait until the cluster is stable and both nodes are up.
Basically, the cluster is now up-and-running. However, during the Verification & Synchronization step, it will complain about not having a non-IP network. The next part is for setting up a disk heartbeat network, that will allow the nodes of the HACMP cluster to exchange disk heartbeat packets over a SAN disk. We're assuming here, you're using EMC storage. The process on other types of SAN storage is more or less similar, except for some differences, e.g. SAN disks on EMC storage are called "hdiskpower" devices, and they're called "vpath" devices on IBM SAN storage.

First, look at the available SAN disk devices on your nodes, and select a small disk, that won't be used to store any data on, but only for the purpose of doing the disk heartbeat. It is a good habit, to request your SAN storage admin to zone a small LUN as a disk heartbeating device to both nodes of the HACMP cluster. Make a note of the PVID of this disk device, for example, if you choose to use device hdiskpower4:
# lspv | grep hdiskpower4
hdiskpower4   000a807f6b9cc8e5    None
So, we're going to set up the disk heartbeat network on device hdiskpower4, with PVID 000a807f6b9cc8e5:
  • Create an concurrent volume group:
    # smitty hacmp
    System Management (C-SPOC)
    HACMP Concurrent Logical Volume Management
    Concurrent Volume Groups
    Create a Concurrent Volume Group
    Select both nodes to create the concurrent volume group on by pressing F7 for each node. Then select the correct PVID. Give the new volume group a name, for example "hbvg".
  • Set up the disk heartbeat network:
    # smitty hacmp
    Extended Configuration
    Extended Topology Configuration
    Configure HACMP Networks
    Add a Network to the HACMP Cluster
    Select "diskhb" and accept the default Network Name.
  • Run a discovery:
    # smitty hacmp
    Extended Configuration
    Discover HACMP-related Information from Configured Nodes
  • Add the disk device:
    # smitty hacmp
    Extended Configuration
    Extended Topology Configuration
    Configure HACMP Communication Interfaces/Devices
    Add Communication Interfaces/Devices
    Add Discovered Communication Interface and Devices
    Communication Devices
    Select the disk device on both nodes by selecting the same disk on each node by pressing F7.
  • Run a Verification & Synchronization again, as described earlier above. Then check with clstat and/or cldump again, to check if the disk heartbeat network comes online.

Topics: AIX, PowerHA / HACMP, System Admin

NFS mounts on HACMP failing

When you want to mount an NFS file system on a node of an HACMP cluster, there are a couple of items you need check, before it will work:

  • Make sure the hostname and IP address of the HACMP node are resolvable and provide the correct output, by running:
    # nslookup [hostname]
    # nslookup [ip-address]
  • The next thing you will want to check on the NFS server, if the node names of your HACMP cluster nodes are correctly added to the /etc/exports file. If they are, run:
    # exportfs -va
  • The last, and tricky item you will want to check is, if a service IP label is defined as an IP alias on the same adapter as your nodes hostname, e.g.:
    # netstat -nr
    Routing tables
    Destination   Gateway       Flags  Refs  Use    If  Exp  Groups
    Route Tree for Protocol Family 2 (Internet):
    default   UG      4    180100 en1  -     -  UHSb    0         0 en1  -     -     UGHS    3    791253 lo0  -     -
    The example above shows you that the default gateway is defined on the en1 interface. The next command shows you where your Service IP label lives:
    # netstat -i
    Name  Mtu   Network   Address         Ipkts   Ierrs Opkts
    en1   1500  link#2    0.2.55.d3.75.77 2587851 0      940024
    en1   1500  10.251.14 node01          2587851 0      940024
    en1   1500  10.251.20 serviceip       2587851 0      940024
    lo0   16896 link#1                    1912870 0     1914185
    lo0   16896 127       loopback        1912870 0     1914185
    lo0   16896 ::1                       1912870 0     1914185
    As you can see, the Service IP label (in the example above called "serviceip") is defined on en1. In that case, for NFS to work, you also want to add the "serviceip" to the /etc/exports file on the NFS server and re-run "exportfs -va". And you should also make sure that hostname "serviceip" resolves to an IP address correctly (and of course the IP address resolves to the correct hostname) on both the NFS server and the client.

Number of results found for topic PowerHA / HACMP: 28.
Displaying results: 1 - 10.