Topics: PowerHA / HACMP

Tweaking the deadman switch

You can tweak the Dead Man Switch settings for HACMP. First have a look at the current setting by running:

# lssrc -ls topsvcs
A system usually has at least 2 heartbeats: 1 through the network: net_ether_01, with a sensitivity of 10 missed beats x 1 second interval x 2 = 20 seconds for it to fail. The other heartbeat is usually the disk heartbeat, diskhb_0, with a sensitivity of 4 missed beats x 2 second interval x 2 = 16 seconds.

Basically, if the other node has failed, HACMP will know if all the heartbeating has failed, thus after 20 seconds.

You can play around with the HACMP detection rates: Set it to normal:
# /usr/es/sbin/cluster/utilities/claddnim -oether -r2
Ethernet heartbeating fails after 20 seconds. If you want to set it to slow: Use "-r3" instead of "-r2", and it fails after 48 seconds. Set it to fast: Use -r1, which will fail it after 10 seconds.

To give you some more time, you can use a grace period:
# claddnim -oether -g 15
This will give you 15 seconds of grace time, which is the time within a network fallover must be taken care of.

You will have to synchronize the cluster after making any changes using claddnim:
# /usr/es/sbin/cluster/utilities/cldare -rt -V 'normal'




If you found this useful, here's more on the same topic(s) in our blog:


UNIX Health Check delivers software to scan Linux and AIX systems for potential issues. Run our software on your system, and receive a report in just a few minutes. UNIX Health Check is an automated check list. It will report on perfomance, capacity, stability and security issues. It will alert on configurations that can be improved per best practices, or items that should be improved per audit guidelines. A report will be generated in the format you wish, and the report includes the issues discovered and information on how to solve the issues as well.

Interested in learning more?