This article describes how to add a new volume group to an existing resource group of an active PowerHA cluster.
The first step is to add the storage to both of the nodes of the PowerHA cluster. In the case of SAN storage, please ensure that your storage administrator adds the storage to both nodes of the cluster. Then discover the newly added storage by running the cfgmgr command on one of the nodes:
# cfgmgr
Set a PVID on all the new disks that have been discovered. For example, for disk hdisk77, run:
# chdev -l hdisk77 -a pv=yes
Repeat this command for any of the new disks.
Next, log in to the other node, and run the cfgmgr command on that node as well, so the disks will be discovered on the other node as well. And when you run "lspv" on the second node after running cfgmgr, you'll notice that the PVID is already set for all the discovered disks (it was set on the first node).
On both nodes, make sure that the disk attributes are set correctly. Now, this may differ for the type of storage used (so please make sure to check your storage vendor's recommendations on this topic), but a good starting point is (for example, for disk hdisk4):
# chdev -l hdisk4 -a max_transfer=0x100000 -a queue_depth=32 -a reserve_policy=no_reserve -a algorithm=round_robin
For clustered nodes, it's important that the reserve_policy is set to no_reserve.
A note about the max_transfer attribute: This value should be set to the same value as the max_xfer_size attribute of the fiber adapter. By default attribute max_transfer is set to 0x40000, which is usually lower than the max_xfer_size attribute on the fiber adapter, which results in a smaller buffer size memory being used. To check the max_xfer_size attribute on the fiber adapter, run (for example for adapter fcs0):
# lsattr -El fcs0 -a max_xfer_size
Also please make sure to set the disk attributes for all the new disks on both nodes. These attributes are stored in the ODM of the AIX system locally, and don't automatically transfer over to other nodes of the cluster, so you'll have to set these attributes for all new disks on all cluster nodes.
The next step is to create the new volume group(s). The first thing that we'll need to know is a common majar number that is available on both nodes of a cluster. For that purpose, run the following command on all cluster nodes, which lists the available major numbers:
# lvlstmajor
Choose a major number available on all cluster nodes. For the purpose of this article, let's assume major number 57 is available.
For PowerHA, a new volume group should be configured as concurrent-capable (configured by the -C option of the mkvg command). Also, The quorum should be disabled (by using -Qn), and the auto-varyon should be disabled as well (by using the -n option) as PowerHA will varyon the volume group for us. Finally, the major number should be set (in the example below: -V 57).
As such, run the following mkvg command to create the volume group (Note: please adjust the volume group name, the major number and disk names according to your situation and preference):
# mkvg -v 57 -S -n -Qn -y fs01vg hdisk38 hdisk42 hdisk45
Note: run this command on only one of the cluster nodes, and continue working on this cluster node for now for the next steps.
Next, create a logical volume (adjust per your situation and preference):
# mklv -y fs01lv -t jfs2 -u 1 -x 1278 hdisk38
This command will create logical volume fs01lv for the purpose of using it for a JFS2 type file system, with an upper bound of 1 (just use 1 disk), and allocate 1278 partitions on disk hdisk38.
Then create a file system on the previously defined logical volume:
# crfs -v jfs2 -d fs01lv -a logname=INLINE -a options=noatime -m /fs01 -A no
This command will create file system /fs01 on top of logical volume fs01lv, and will use an inline log (recommended for optimal performance), will not record access times (options=noatime) to avoid unneccessary writes to the file system, and will tell AIX not to automatically mount the file system at system start (-A no), as PowerHA will mount the file system instead.
At this point, the volume group, logical volume and file system have been created. You can create additional volume groups, logical volumes and file systems as it pertains to your situation. Once done, ensure that the volume group is varied off, for example for volume group fs01vg:
# varyoffvg fs01vg
Should the varyoff command fail at this point, then please check if the file system(s) is/are still mounted, and un-mount them before running the varyoffvg command.
Now run "lspv" on both nodes of the cluster, and look at the PVIDs listed. Pick one of the disks on which you configured the volume group on the first node, and take that disk's PVID, for example 00fac78651b28b53. On the other node, run the importvg command to import the volume group (which also imports the information about any logical volumes and file systems configured in that volume group), using the PVID of one of the disks, and the major number previously defined, for example:
# importvg -n -V 57 -y fs01vg 00fac78651b28b53
Repeat this for any additional volume groups in your situation. Do make sure to select the correct PVID of one of the disks in the volume group on the first node when importing this volume group on the second node, as you'll want to make sure both nodes of a PowerHA cluster use the same volume group names.
At this time, the storage is properly configured on both nodes, and it is time to add the volume group(s) to the PowerHA resource group.
First, verify that the cluster is in a good state by running:
# smitty hacmp
problem determination tools
powerha verification
verify powerha configuration
Once confirmed, allow PowerHA to discover the new disks in the cluster:
# smitty cm_discover_nw_interfaces_and_disks
Now add the volume group(s) to the resource group of the cluster:
# smitty hacmp
cluster applications and resources
resource groups
change/show resources and attributes for a resource group
Select your resource group, and add the volume group(s) to the "Volume Groups" entry.
Next, sync the cluster, which should bring the volume group(s) online:
# smitty hacmp
cluster applications and resources
verify and synchronize cluster configuration
Once this is complete, you should be able to see the new volume group(s) online, by running:
# lsvg -o
And you should be able to see that any new file systems have been mounted by PowerHA:
# df
Not required, but good practice, is to now schedule a failover test for your PowerHA cluster, to ensure everything is still working as it should, in case of a fail-over scenario.
PowerHA version 7 (starting with version 7.1.3) includes a new feature that allows you to generate an HTML report that includes a lot of very useful information of your PowerHA cluster. There are no external requirements, and it is included is the base PowerHA product.
The command needed to generate the report is clmgr. For example:
# clmgr view report cluster TYPE=html FILE=/tmp/powerha.report
You may also include a company logo and name, for example by running:
# clmgr view report cluster TYPE=html FILE=/tmp/powerha.report COMPANY_LOGO="mylogo.jpg" COMPANY_NAME="MY COMPANY"
Tip: You may schedule this command through cron to get a regular up to date version.
This HTML report is officially only supported on Internet Explorer and Firefox, although other browsers might work just fine.
IBM has implemented a new feature implemented for JFS2 filesystems to prevent simultaneous mounting within PowerHA clusters.
While PowerHA can give concurrent access of volume groups to multiple systems, mounting a JFS2 filesystem on multiple nodes simultaneously will cause filesystem corruption. These simultaneous mount events can also cause a system crash, when the system detects a conflict between data or metadata in the filesystem and the in-memory state of the filesystem. The only exception to this is mounting the filesystem read-only, where files or directories can't be changed.
In AIX 7100-01 and 6100-07 a new feature called "Mount Guard" has been added to prevent simultaneous or concurrent mounts. If a filesystem appears to be mounted on another server, and the feature is enabled, AIX will prevent mounting on any other server. Mount Guard is not enabled by default, but is configurable by the system administrator. The option is not allowed to be set on base OS filesystems such as /, /usr, /var etc.
To turn on Mount Guard on a filesystem you can permanently enable it via /usr/sbin/chfs:
# chfs -a mountguard=yes /mountpoint
/mountpoint is now guarded against concurrent mounts.
The same option is used with crfs when creating a filesystem.
To turn off mount guard:
# chfs -a mountguard=no /mountpoint
/mountpoint is no longer guarded against concurrent mounts.
To determine the mount guard state of a filesystem:
# lsfs -q /mountpoint
Name Nodename Mount Pt VFS Size Options Auto Accounting
/dev/fslv -- /mountpoint jfs2 4194304 rw no no
(lv size: 4194304, fs size: 4194304, block size: 4096, sparse files: yes,
inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI:
no, VIX: yes, EFS: no, ISNAPSHOT: no, MAXEXT: 0, MountGuard: yes)
The /usr/sbin/mount command will not show the mount guard state.
When a filesystem is protected against concurrent mounting, and a second mount attempt is made you will see this error:
# mount /mountpoint
mount: /dev/fslv on /mountpoint:
Cannot mount guarded filesystem.
The filesystem is potentially mounted on another node
After a system crash the filesystem may still have mount flags enabled and refuse to be mounted. In this case the guard state can be temporarily overridden by the "noguard" option to the mount command:
# mount -o noguard /mountpoint
mount: /dev/fslv on /mountpoint:
Mount guard override for filesystem.
The filesystem is potentially mounted on another node.
Reference:
http://www-01.ibm.com/support/docview.wss?uid=isg3T1018853The standard tool for cluster monitoring is clstat, which comes along with PowerHA SystemMirror/HACMP. Clstat is rather slow with its updates, and sometimes the required clinfo deamon needs restarting in order to get it operational, so this is, well, not perfect. There's a script which is also easy to use. It is written by PowerHA/HACMP guru Alex Abderrazag. This script shows you the correct PowerHA/HACMP status, along with adapter and volume group information. It works fine on HACMP 5.2 through 7.2. You can download it here: qha. This is version 9.06. For the latest version, check www.lpar.co.uk.
This tiny but effective tool accepts the following flags:
- -n (show network interface info)
- -N (show interface info and active HBOD)
- -v (show shared online volume group info)
- -l (log to /tmp/qha.out)
- -e (show running events if cluster is unstable)
- -m (show status of monitor app servers if present)
- -1 (exit after first iteration)
- -c (CAA SAN / Disk Comms)
For example, run:
# qha -nev
It's useful to put "qha" in /usr/es/sbin/cluster/utilities, as that path is usually already defined in $PATH, and thus you can run qha from anywhere.
A description of the possible cluster states:
- ST_INIT: cluster configured and down
- ST_JOINING: node joining the cluster
- ST_VOTING: Inter-node decision state for an event
- ST_RP_RUNNING: cluster running recovery program
- ST_BARRIER: clstrmgr waiting at the barrier statement
- ST_CBARRIER: clstrmgr is exiting recovery program
- ST_UNSTABLE: cluster unstable
- NOT_CONFIGURED: HA installed but not configured
- RP_FAILED: event script failed
- ST_STABLE: cluster services are running with managed resources (stable cluster) or cluster services have been "forced" down with resource groups potentially in the UNMANAGED state (HACMP 5.4 only)
How do you monitor multiple HACMP/PowerHA clusters? You're probably familiar with the clstat or the xclstat commands. These are nice, but not sufficient when you have more than 8 HACMP/PowerHA clusters to monitor, as it can't be configured to monitor more than 8 clusters. It's also difficult to get an overview of ALL clusters in a SINGLE look with clstat. IBM included a clstat.cgi in HACMP 5 to show the cluster status on a webpage. This still doesn't provide an overview in a single look, as the clstat.cgi shows a long listing of all clusters, and it is just like clstat limited to monitoring only 8 clusters.
The HACMP/PowerHA cluster status can be retrieved via SNMP (this is actually what clstat does too). Using the IP addresses of a cluster and the snmpinfo command, you can remotely retrieve cluster status information, and use that information to build a webpage. We've written a script for this purpose. By using colors for the status of the clusters and the nodes (green = ok, yellow = something is happening, red = error), you can get a quick overview of the status of all the HACMP/PowerHA clusters.
Per cluster you can see: the cluster name, the cluster ID, HACMP version and the status of the cluster and all its nodes. It will also show you where any resource groups are active.
You can download the script
here. This is version 1.6. Untar the file that you download. There is a README in the package, that will tell you how you can configure the script. This script has been tested with HACMP version 4, 5, 6, and up to PowerHA version 7.1.3.4.
If clstat is not working, you may get the following error, when running clstat:
# clstat
Failed retrieving cluster information.
There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.
Refer to the HACMP Administration Guide for more information.
Additional information for verifying the SNMP configuration on AIX 6
can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
To resolve this, first of all, go ahead and read the README that is referred to. You'll find that you have to enable an entry in /etc/snmdv3.conf:
Commands clstat or cldump will not start if the internet MIB tree is not
enabled in snmpdv3.conf file. This behavior is usually seen in
AIX 6.1 onwards where this internet MIB entry was intentionally
disabled as a security issue. This internet MIB entry is required to
view/resolve risc6000clsmuxpd (1.3.6.1.4.1.2.3.1.2.1.5) MIB sub tree
which is used by clstat or cldump functionality.
There are two ways to enable this MIB sub tree (risc6000clsmuxpd). They
are:
1) Enable the main internet MIB entry by adding this line in /etc/snmpdv3.conf file:
VACM_VIEW defaultView internet - included -
But doing so is not recommended, as it unlocks the entire MIB tree.
2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling
the main MIB tree by adding this line in /etc/snmpdv3.conf file.
VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included -
Note: After enabling the MIB entry above snmp daemon must be restarted
with the following commands as shown below:
# stopsrc -s snmpd
# startsrc -s snmpd
After snmp is restarted leave the daemon running for about two minutes
before attempting to start clstat or cldump.
Sometimes, even after doing this, clstat or cldump still don't work. Make sure that a COMMUNITY entry is present in /etc/snmpdv3.conf:
COMMUNITY public plubic noAuthNoPriv 0.0.0.0 0.0.0.0 -
The next thing may sound silly, but edit the /etc/snmpdv3.conf file, and take out the coments. Change this:
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password # gated
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password # HACMP/ES for AIX ...
To:
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password
Then, recycle the deamons on all cluster nodes. This can be done while the cluster is up and running:
# stopsrc -s hostmibd
# stopsrc -s snmpmibd
# stopsrc -s aixmibd
# stopsrc -s snmpd
# sleep 4
# chssys -s hostmibd -a "-c public"
# chssys -s aixmibd -a "-c public"
# chssys -s snmpmibd -a "-c public"
# sleep 4
# startsrc -s snmpd
# startsrc -s aixmibd
# startsrc -s snmpmibd
# startsrc -s hostmibd
# sleep 120
# stopsrc -s clinfoES
# startsrc -s clinfoES
# sleep 120
Now, to verify that it works, run either clstat or cldump, or the following command:
# snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs cluster
Still not working at this point? Then run an Extended Verification and Synchronization:
# smitty cm_ver_and_sync.select
After that, clstat, cldump and snmpinfo should work.
If you run into the following error:
cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group.
This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:
# varyoffvg vg
And then retry the LVM command. If it continues to be a problem, then stop HACMP on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.
In order to keep the system time synchronized with other nodes in an HACMP cluster or across the enterprise, Network Time Protocol (NTP) should be implemented. In its default configuration, NTP will periodically update the system time to match a reference clock by resetting the system time on the node. If the time on the reference clock is behind the time of the system clock, the system clock will be set backwards causing the same time period to be passed twice. This can cause internal timers in HACMP and Oracle databases to wait longer periods of time under some circumstances. When these circumstances arise, HACMP may stop the node or the Oracle instance may shut itself down.
Oracle will log an ORA-29740 error when it shuts down the instance due to inconsistent timers. The hatsd daemon utilized by HACMP will log a TS_THREAD_STUCK_ER error in the system error log just before HACMP stops a node due to an expired timer.
To avoid this issue, system managers should configure the NTP daemon to increment time on the node slower until the system clock and the reference clock are in sync (this is called "slewing" the clock) instead of resetting the time in one large increment. The behavior is configured with the -x flag for the xntpd daemon.
To check the current running configuration of xntpd for the -x flag:
# ps -aef | grep xntpd | grep -v grep
root 409632 188534 0 11:46:45 - 0:00 /usr/sbin/xntpd
To update the current running configuration of xntpd to include the -x flag:
# chssys -s xntpd -a "-x"
0513-077 Subsystem has been changed.
# stopsrc -s xntpd
0513-044 The /usr/sbin/xntpd Subsystem was requested to stop.
# startsrc -s xntpd
0513-059 The xntpd Subsystem has been started. Subsystem PID is 40932.
# ps -f | grep xntpd | grep -grep
root 409632 188534 0 11:46:45 - 0:00 /usr/sbin/xntpd -x