Whenever you need to perform an upgrade of SDD (and it is wise to keep it up-to-date), make sure you check the SDD documentation before doing this. Here's the quick steps to perform to do the updates.
- Check for any entries in the errorlog that could interfere with the upgrades:
# errpt -a | more
- Check if previously installed packages are OK:
# lppchk -v
- Commit any previously installed packages:
# installp -c all
- Make sure to have a recent mksysb image of the server and before starting the updates to the rootvg, do an incremental TSM backup. Also a good idea is to prepare the alt_disk_install on the second boot disk.
- For HACMP nodes: check the cluster status and log files to make sure the cluster is stable and ready for the upgrades.
- Update fileset devices.fcp.disk.ibm to the latest level using smitty update_all.
- For ESS environments: Update host attachment script ibm2105 and ibmpfe.essutil to the latest available levels using smitty update_all.
- Enter the lspv command to find out all the SDD volume groups.
- Enter the lsvgfs command for each SDD volume group to find out which file systems are mounted, e.g.:
# lsvgfs vg_name
- Enter the umount command to unmount all file systems belonging to the SDD volume groups.
- Enter the varyoffvg command to vary off the volume groups.
- If you are upgrading to an SDD version earlier than 1.6.0.0; or if you are upgrading to SDD 1.6.0.0 or later and your host is in a HACMP environment with nonconcurrent volume groups that are varied-on on other host, that is, reserved by other host, run the vp2hd volume_group_name script to convert the volume group from the SDD vpath devices to supported storage hdisk devices. Otherwise, you skip this step.
- Stop the SDD server:
# stopsrc -s sddsrv
- Remove all the SDD vpath devices:
# rmdev -dl dpo -R
- Use the smitty command to uninstall the SDD. Enter smitty deinstall and press Enter. The uninstallation process begins. Complete the uninstallation process.
- If you need to upgrade the AIX operating system, you could perform the upgrade now. If required, reboot the system after the operating system upgrade.
- Use the smitty command to install the newer version of the SDD. Note: it is also possible to do smitty update_all to simply update the SDD fileset, without first uninstalling it; but IBM recommends doing an uninstall first, then patch the OS, and then do an install of the SDD fileset.
- Use the smitty device command to configure all the SDD vpath devices to the Available state.
- Enter the lsvpcfg command to verify the SDD configuration.
- If you are upgrading to an SDD version earlier than 1.6.0.0, run the hd2vp volume_group_name script for each SDD volume group to convert the physical volumes from supported storage hdisk devices back to the SDD vpath devices.
- Enter the varyonvg command for each volume group that was previously varied offline.
- Enter the lspv command to verify that all physical volumes of the SDD volume groups are SDD vpath devices.
- Check for any errors:
# errpt | more
# lppchk -v
# errclear 0
- Enter the mount command to mount all file systems that were unmounted.
Attention: If the physical volumes on an SDD volume group's physical volumes are mixed with hdisk devices and SDD vpath devices, you must run the
dpovgfix utility to fix this problem. Otherwise, SDD will not function properly:
# dpovgfix vg_name
With HACMP clusters documentation is probably the most important issue. You cannot properly manage an HACMP cluster if you do not document it. Document the precise configuration of the complete cluster and document any changes you've carried out. Also document all management procedures and stick to them! The cluster snapshot facility is an excellent way of documenting your cluster.
Next step: get educated. You have to know exactly what you're doing on an HACMP cluster. If you have to manage a production cluster, getting a certification is a necessity. Don't ever let non-HACMP-educated UNIX administrators on your HACMP cluster nodes. They don't have a clue of what's going on and probably destroy your carefully layed-out configuration.
Geographically separated nodes are important! Too many cluster nodes just sit on top of each other in the same rack. What if there's a fire? Or a power outage? Having an HACMP cluster won't help you if both nodes are on a single location, use the same power, or the same network switches.
Put your HACMP logs in a sensible location. Don't put them in /tmp knowing that /tmp gets purged every night....
Test, test, test and test your cluster over again. Doing take-over tests every half year is best practice. Document your tests, and your test results.
Don't assume that your cluster is high available after installing the cluster software. There are a lot of other things to consider in your infrastructure to avoid single points of failures, like: No two nodes sharing the same I/O drawer; Power redundancy; No two storage or network adapters on the same SCSI backplane or bus; Redundancy in SAN HBA's; Application monitoring in place.
Official IBM sites:
Other PowerHA / HACMP related sites:
IBM's Redbooks on HACMP
lpar.co.uk (Alex Abderrazag)

This is NOT a cluster snapshot, but a snapshot of a cluster.
(This is a very inside joke. Just ignore it if you don't get it.)PowerHA is the new name for HACMP, which is short for High Availability Cluster Multi-Processing, a product of IBM. PowerHA / HACMP runs on AIX (and also on Linux) and its purpose is to provide high availability to systems, mainly for hardware failures. It can automatically detect system or network failures and can provide the capability to recover system hardware, applications, data and users while keeping recovery time to an absolute minimum. This is useful for systems that need to be online 24 hours a day, 365 days per year; for organizations that can't afford to have systems down for longer than 15 minutes. It's not completely fault-tolerant, but it is high available.
Compared to other cluster software, PowerHA / HACMP is highly robust, allows for large distances between nodes of a single cluster and allows up to 32 nodes in a cluster. Previous version of PowerHA / HACMP have had a reputation of having a lot of "bugs". From version 5.4 onward PowerHA / HACMP has seen a lot of improvements.
IBM's HACMP exists for over 15 years. It's not actually an IBM product; IBM bought it from CLAM, which was later renamed to Availant and then renamed to LakeViewTech and nowadays is called Vision Solutions. Until August 2006, all development of HACMP was done by CLAM. Nowadays, IBM does its own development of PowerHA / HACMP in Austin, Poughkeepsie and Bangalore.
Competitors of PowerHA / HACMP are Veritas Cluster and Echo Cluster. The last one, Echo Cluster, is a product of Vision Solutions mentioned above and tends to be easier to set-up and meant for simpler clusters. Veritas is only used by customers that use it already on other operating systems, like Sun Solaris and Windows Server environments, and don't want to invest into yet another clustering technology.
If you need to exclude a specific file system from the TSM backup, than you would add the following line to the dsm.sys file:
DOMAIN ALL-LOCAL -/opt/archive
This examples will avoid backing up file system /opt/archive.
Now, what if you wish to exclude a certain directory within a file system from the backup:
Create the following enty in the dsm.sys file:
INCLExcl /usr/tivoli/tsm/client/ba/bin/inclexcl
Then create the inclexcl file and add the following line:
Exclude.dir /opt/archive/tmp
This will only exclude the tmp folder in file system /opt/archive.
You can check with the following command:
# dsmc q inclexcl

How do you test if Oracle TDP (RMAN) is working properly?
# tdpoconf showenv
If you have a TSM database spread across mutliple database volumes on disk, be very careful on how you use them. It is best to assign multiple database volumes on different disks, all equal in size. This way, TSM is able to use the
performance of several disks equally.
Also, take a look at the assigned capacity of the TSM database. It might well be that not the full capacity of the database is assigned to TSM; this way, some database volumes are more heavily used than other. A good command to check if the database volumes are equally used is "q dbspace":
q dbspace
q db f=d
These commands can be used to find out, how your database capacity is assigned. If it isn't fully assigned, then do it now.
Check the output of the TSM commands with the location of your database volumes on the operating system, to see if the database volumes are equally spread across multiple disks. The AIX commands
iostat and
vmstat will give you a good idea if your disks are used equally.
When using TSM on AIX, JFS file systems for storage of the database volumes are preferred, and not RAW logical volumes. This subject usually tends to cause discussion, so here are the reasons for using JFS and not use RAW logical volumes:
When TSM has JFS files open, they are locked by JFS and other applications cannot write to them. However, raw logical volumes are not locked and any application can write to them. TSM tries to prevent starting more than one instance of the same server from the same directory, but it can be done. If you are using raw logical volumes, multiple server instances can simultaneously update the same information. This could cause errors in the database, recovery log, or storage pool raw logical volumes. Auditing a corrupted TSM database and fixing corruptions can take up to a day downtime.
After a database, recovery log, or storage pool volume is defined to TSM, you cannot change its size. TSM uses size information to determine where data is placed and whether volumes have been modified by other applications or utilities. However, if you use raw logical volumes, smit lets you increase their sizes. If the volume is defined to TSM before its size is increased, TSM cannot use the volume or its data.
The use of JFS file systems for database, recovery log, and storage pool volumes requires slightly more CPU than is required for raw volumes. However, JFS read-ahead caching improves performance. Lab tests have proven that the use of raw logical volumes tend to give better performance. So as long as you have enough CPU, you may still use JFS.
How many times can the tape drives be cleaned?
# mtlib -l /dev/lmcp0 -qL
Look for "avail xxxx cleaner cycles" at the bottom.
Which cleaning tapes are in the library?
# mtlib -l /dev/lmcp0 -qC -s FFFD
The first column in the output is the volume serial number of the cleaning tapes.
When was the cleaning tape last used?
# mtlib -l /dev/lmcp0 -qE -V [tape-volume-serial-number] -u
Look for "last used" at the bottom of the output.
How are my tape drives doing (from a TSM viewpoint)?
# dsmadmc -c comma -id=readonly -password=readonly q dr f=d
Look for "On-Line" and "Drive State" in the output. Also check if the paths to your tape drives are on-line.
# query path
Number of results found: 469.
Displaying results: 371 - 380.