- GPFS on IBM.com
- Search www.redbooks.ibm.com on GPFS (Especially for the GPFS design and principles redbook)
- IBM Spectrum Scale - Frequently Asked Questions
The traditional method for making an Oracle database capable of 7*24 operation is by means of creating an HACMP cluster in an Active-Standby configuration. In case of a failure of the Active system, HACMP lets the standby system take over the resources, start Oracle and thus resumes operation. This takeover is done with a downtime period of aprox. 5 to 15 minutes, however the impact on the business applications is more severe. It can lead to interruptions up to one hour in duration.
Another way to achieve high availability of databases, is to use a special version of the Oracle database software called Real Application Cluster, also called RAC. In a RAC cluster multiple systems (instances) are active (sharing the workload) and provide a near always-on database operation. The Oracle RAC software relies on IBM's HACMP software to achieve high availability for hardware and the operating system platform AIX. For storage it utilizes a concurrent filesystem called GPFS (General Parallel File System), a product of IBM. Oracle RAC 9 uses GPFS and HACMP. With RAC 10 you no longer need HACMP and
GPFS.
HACMP is used for network down notifications. Put all network adapters of 1 node on a single switch and put every node on a different switch. HACMP only manages the public and private network service adapters. There are no standby, boot or management adapters in a RAC HACMP cluster. It just uses a single
hostname; Oracle RAC and GPFS do not support hostname take-over or IPAT (IP Address take-over). There are no disks, volume groups or resource groups defined in an HACMP RAC cluster. In fact, HACMP is only necessary for event handling for Oracle RAC.
Name your HACMP RAC clusters in such away, that you can easily recognize the cluster as a RAC cluster, by using a naming convention that starts with RAC_.
On every GPFS node of an Oracle RAC cluster a GPFS daemon (mmfs) is active. These daemons need to communicate with each other. This is done via the public network, not via the private network.
Cache Fusion
Via SQL*Net an Oracle block is read in memory. If a second node in an HACMP RAC cluster requests the same block, it will first check if it already has it stored locally in its own cache. If not, it will use a private dedicated network to ask if another node has the block in cache. If not, the block will be read from disk. This is called Cache Fusion or Oracle RAC interconnect.
This is why on RAC HACMP clusters, each node uses an extra private network adapter to communicate with the other nodes, for Cache Fusion purposes only. All other communication, including the communication between the GPFS daemons on every node and the communication from Oracle clients, is done via the public network adapter. The throughput on the private network adapter can be twice as high as on the public network adapter.
Oracle RAC will use its own private network for Cache Fusion. If this network is not available, or if one node is unable to access the private network, then the private network is no longer used, but the public network will be used instead. If the private network returns to normal operation, then a fallback to the private network will occur. Oracle RAC uses cllsif of HACMP for this purpose.
EMC Grab is a utility that is run locally on each host and gathers storage-specific information (driver version, storage-technical details, etc). The EMC Grab report creates a zip file. This zip file can be used by EMC support.
You can download the "Grab Utility" from the following locations:
tar -xvf *tarThen run:
/tmp/emc/emcgrab/emcgrab.shThe script is interactive and finishes after a couple of minutes.
If you run into not being able to access an hdiskpowerX disk, you may need to reset the reservation bit on it:
# /usr/lpp/EMC/Symmetrix/bin/emcpowerreset fscsiX hdiskpowerX
There is a known bug on AIX with Solution Enabler, the software responsible for BCV backups. Hdiskpower devices dissapear and you need to run the following command to make them come back. This will happen when a server is rebooted. BCV devices are only visible on the target servers.
# /usr/lpp/EMC/Symmetrix/bin/mkbcv -a ALL
hdisk2 Available
hdisk3 Available
hdisk4 Available
hdisk5 Available
hdisk6 Available
hdisk7 Available
hdisk8 Available
hdiskpower1 Available
hdiskpower2 Available
hdiskpower3 Available
hdiskpower4 Available
Topics: AIX, Backup & restore, Monitoring, Red Hat / Linux, Spectrum Protect↑
Report the end result of a TSM backup
A very easy way of getting a report from a backup is by using the POSTSchedulecmd entry in the dsm.sys file. Add the following entry to your dsm.sys file (which is usually located in /usr/tivoli/tsm/client/ba/bin or /opt/tivoli/tsm/client/ba/bin):
POSTSchedulecmd "/usr/local/bin/RunTsmReport"This entry tells the TSM client to run script /usr/local/bin/RunTSMReport, as soon as it has completed its scheduled command. Now all you need is a script that creates a report from the dsmsched.log file, the file that is written to by the TSM scheduler:
#!/bin/bash
TSMLOG=/tmp/dsmsched.log
WRKDIR=/tmp
echo "TSM Report from `hostname`" >> ${WRKDIR}/tsmc
tail -100 ${TSMLOG} > ${WRKDIR}/tsma
grep -n "Elapsed processing time:" ${WRKDIR}/tsma > ${WRKDIR}/tsmb
CT2=`cat ${WRKDIR}/tsmb | awk -F":" '{print $1}'`
((CT3 = $CT2 - 14))
((CT5 = $CT2 + 1 ))
CT4=1
while read Line1 ; do
if [ ${CT3} -gt ${CT4} ] ; then
((CT4 = ${CT4} + 1 ))
else
echo "${Line1}" >> ${WRKDIR}/tsmc
((CT4 = ${CT4} + 1 ))
if [ ${CT4} -gt ${CT5} ] ; then
break
fi
fi
done < ${WRKDIR}/tsma
mail -s "`hostname` Backup" email@address.com < ${WRKDIR}/tsmc
rm ${WRKDIR}/tsma ${WRKDIR}/tsmb ${WRKDIR}/tsmc
Topics: EMC, Installation, SAN, Storage↑
EMC and MPIO
You can run into an issue with EMC storage on AIX systems using MPIO (No Powerpath) for your boot disks:
After installing the ODM_DEFINITONS of EMC Symmetrix on your client system, the system won't boot any more and will hang with LED 554 (unable to find boot disk).
The boot hang (LED 554) is not caused by the EMC ODM package itself, but by the boot process not detecting a path to the boot disk if the first MPIO path does not corresponding to the fscsiX driver instance where all hdisks are configured. Let me explain that more in detail:
Let's say we have an AIX system with four HBAs configured in the following order:
# lscfg -v | grep fcsLooking at the MPIO path configuration, here is what we have for the rootvg disk:
fcs2 (wwn 71ca) -> no devices configured behind this fscsi2 driver instance (path only configured in CuPath ODM table)
fcs3 (wwn 71cb) -> no devices configured behind this fscsi3 driver instance (path only configured in CuPath ODM table)
fcs0 (wwn 71e4) -> no devices configured behind this fscsi0 driver instance (path only configured in CuPath ODM table)
fcs1 (wwn 71e5) -> ALL devices configured behind this fscsi1 driver instance
# lspath -l hdisk2 -H -F"name parent path_id connection status"The fscsi1 driver instance is the second path (pathid 1), then remove the 3 paths keeping only the path corresponding to fscsi1 :
name parent path_id connection status
hdisk2 fscsi0 0 5006048452a83987,33000000000000 Enabled
hdisk2 fscsi1 1 5006048c52a83998,33000000000000 Enabled
hdisk2 fscsi2 2 5006048452a83986,33000000000000 Enabled
hdisk2 fscsi3 3 5006048c52a83999,33000000000000 Enabled
# rmpath -l hdisk2 -p fscsi0 -dAfterwards, do a savebase to update the boot lv hd5. Set up the bootlist to hdisk2 and reboot the host.
# rmpath -l hdisk2 -p fscsi2 -d
# rmpath -l hdisk2 -p fscsi3 -d
# lspath -l hdisk2 -H -F"name parent path_id connection status"
It will come up successfully, no more hang LED 554.
When checking the status of the rootvg disk, a new hdisk10 has been configured with the correct ODM definitions as shown below:
# lspvTo summarize, it is recommended to setup ONLY ONE path when installing an AIX to a SAN disk, then install the EMC ODM package then reboot the host and only after that is complete, add the other paths. Dy doing that we ensure that the fscsiX driver instance used for the boot process has the hdisk configured behind.
hdisk10 0003027f7f7ca7e2 rootvg active
# lsdev -Cc disk
hdisk2 Defined 00-09-01 MPIO Other FC SCSI Disk Drive
hdisk10 Available 00-08-01 EMC Symmetrix FCP MPIO Raid6
This is a procedure how to replace a failing HBA or fibre channel adapter, when used in combination with SDD storage:
- Determine which adapter is failing (0, 1, 2, etcetera):
# datapath query adapter
- Check if there are dead paths for any vpaths:
# datapath query device
- Try to set a "degraded" adapter back to online using:
# datapath set adapter 1 offline
(that is, if adapter "1" is failing, replace it with the correct adapter number).
# datapath set adapter 1 online - If the adapter is still in a "degraded" status, open a call with IBM. They most likely require you to take a snap from the system, and send the snap file to IBM for them to analyze and they will conclude if the adapter needs to be replaced or not.
- Involve the SAN storage team if the adapter needs to be replaced. They will have to update the WWN of the failing adapter when the adapter is replaced for a new one with a new WWN.
- If the adapter needs to be replaced, wait for the IBM CE to be onsite with the new HBA adapter. Note the new WWN and supply that to the SAN storage team.
- Remove the adapter:
# datapath remove adapter 1
(replace the "1" with the correct adapter that is failing). - Check if the vpaths now all have one less path:
# datapath query device | more
- De-configure the adapter (this will also de-configure all the child devices, so you won't have to do this manually), by running: diag, choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Unconfigure a Device. Select the correct adapter, e.g. fcs1, set "Unconfigure any Child Devices" to "yes", and "KEEP definition in database" to "no". Hit ENTER.
- Replace the adapter: Run diag and choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Replace/Remove a PCI Hot Plug Adapter. Choose the correct device (be careful, you won't see the adapter name here, but only "Unknown", because the device was unconfigured).
- Have the IBM CE replace the adapter.
- Close any events on the failing adapter on the HMC.
- Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention LED.
- Check the adapter firmware level using:
# lscfg -vl fcs1
(replace this with the actual adapter name).
And if required, update the adapter firmware microcode. Validate if the adapter is still functioning correctly by running:# errpt
# lsdev -Cc adapter - Have the SAN admin update the WWN.
- Run:
# cfgmgr -S
- Check the adapter and the child devices:
# lsdev -Cc adapter# lsdev -p fcs1
(replace this with the correct adapter name).
# lsdev -p fscsi1 - Add the paths to the device:
# addpaths
- Check if the vpaths have all paths again:
# datapath query device | more
Topics: Monitoring, PowerHA / HACMP, Security↑
HACMP 5.4: How to change SNMP community name from default "public" and keep clstat working
HACMP 5.4 supports changing the default community name from "public" to something else. SNMP is used for clstatES communications. Using the "public" SNMP community name, can be a security vulnerability. So changing it is advisable.
First, find out what version of SNMP you are using:
# ls -l /usr/sbin/snmpd(In this case, it is using version 3).
lrwxrwxrwx 1 root system 9 Sep 08 2008 /usr/sbin/snmpd -> snmpdv3ne
Make a copy of your configuration file. It is located on /etc.
/etc/snmpd.conf <- Version 1Edit the file and replace wherever public is mentioned for your new community name. Make sure to use not more that 8 characters for the new community name.
/etc/snmpdv3.conf <- Version 3
Change subsystems and restart them:
# chssys -s snmpmibd -a "-c new"Test using your locahost:
# chssys -s hostmibd -a "-c new"
# chssys -s aixmibd -a "-c new"
# stopsrc -s snmpd
# stopsrc -s aixmibd
# stopsrc -s snmpmibd
# stopsrc -s hostmibd
# startsrc -s snmpd
# startsrc -s hostmibd
# startsrc -s snmpmibd
# startsrc -s aixmibd
# snmpinfo -m dump -v -h localhost -c new -o /usr/es/sbin/cluster/hacmp.defs nodeTableIf the command hangs, something is wrong. Check the changes you made.
If everything works fine, perform the same change in the other node and test again. Now you can test from one server to the other using the snmpinfo command above.
If you need to backout, replace with the original configuration file and restart subsystems. Note in this case we use double-quotes. There is no space.
# chssys -s snmpmibd -a ""Okay, now make the change to clinfoES and restart and both nodes:
# chssys -s hostmibd -a ""
# chssys -s aixmibd -a ""
# stopsrc -s snmpd
# stopsrc -s aixmibd
# stopsrc -s snmpmibd
# stopsrc -s hostmibd
# startsrc -s snmpd
# startsrc -s hostmibd
# startsrc -s snmpmibd
# startsrc -s aixmibd
# chssys -s clinfoES -a "-c new"Wait a few minutes and you should be able to use clstat again with the new community name.
# stopsrc -s clinfoES
# startsrc -s clinfoES
Disclaimer: If you have any other application other than clinfoES that uses snmpd with the default community name, you should make changes to it as well. Check with your application team or software vendor.
You can tweak the Dead Man Switch settings for HACMP. First have a look at the current setting by running:
# lssrc -ls topsvcsA system usually has at least 2 heartbeats: 1 through the network: net_ether_01, with a sensitivity of 10 missed beats x 1 second interval x 2 = 20 seconds for it to fail. The other heartbeat is usually the disk heartbeat, diskhb_0, with a sensitivity of 4 missed beats x 2 second interval x 2 = 16 seconds.
Basically, if the other node has failed, HACMP will know if all the heartbeating has failed, thus after 20 seconds.
You can play around with the HACMP detection rates: Set it to normal:
# /usr/es/sbin/cluster/utilities/claddnim -oether -r2Ethernet heartbeating fails after 20 seconds. If you want to set it to slow: Use "-r3" instead of "-r2", and it fails after 48 seconds. Set it to fast: Use -r1, which will fail it after 10 seconds.
To give you some more time, you can use a grace period:
# claddnim -oether -g 15This will give you 15 seconds of grace time, which is the time within a network fallover must be taken care of.
You will have to synchronize the cluster after making any changes using claddnim:
# /usr/es/sbin/cluster/utilities/cldare -rt -V 'normal'
Displaying results: 421 - 430.



