How UNIX Health Check works

Downloading UNIX Health Check

Please visit the following URL to download UNIX Health Check:
Installing UNIX Health Check

Transfer the UNIX Health Check tar image file onto the UNIX system of your preference, and un-pack the file.

For the AIX version of UNIX Health Check, the tar image file is called ahc_latest.tar. For the Red Hat Enterprise Linux version of UNIX Health Check, the tar image file is called rhc_latest.tar. Any destination directory will do; you can transfer the tar image into the directory of your preference and un-pack it there.

For example, for the AIX version:
root@(testaix1) /uhc # ls *tar
ahc_latest.tar
root@(testaix1) /uhc # tar -xvf *tar
x checkactivatedrpcservices.ksh, 1625 bytes, 4 media blocks.
x checkadaptersdefined.ksh, 297 bytes, 1 media blocks.
x checkadapters.ksh, 319 bytes, 1 media blocks.
x checkaiooa.ksh, 334 bytes, 1 media blocks.
x checkaiostatus.ksh, 1765 bytes, 4 media blocks.
x checkall.ksh, 26286 bytes, 52 media blocks.
x checkaudit.ksh, 235 bytes, 1 media blocks.
...
...
...
x checkxntpd.ksh, 2557 bytes, 5 media blocks.
x checkzombies.ksh, 407 bytes, 1 media blocks.
x COPYRIGHT, 930 bytes, 2 media blocks.
x DESCRIPTIONS, 124444 bytes, 244 media blocks.


Running checks

Once the tar image has been un-packed, you will notice a lot of files within the chosen directory. Most of the files are individual check scripts that start with the name "check". Each individual script will check a certain function or configuration of your system. You can run each script individually if you like. For example, to check if wget is installed, run checkwget.ksh (for AIX) or checkwget.sh (for Red Hat Enterprise Linux). To determine the model name of the AIX server, run checkmodelname.ksh (for AIX) or checkmodelname.sh (for Red Hat Enterprise Linux).
root@(testaix1) /uhc # checkwget.ksh
wget-1.9.1-1
root@(testaix1) /uhc # checkmodelname.ksh
9117-MMB

Note 1: UNIX Health Check is designed to run as user root only. Many scripts will still run using a different user account, but UNIX Health Check is only supported by running via the root account. Root access is required, because UNIX Health Check runs several root-level commands.

Note 2: UNIX Health Check will never change anything on the UNIX system; it only reports. UNIX Health Check is not designed to automatically resolve any issues found, because the configuration can depend on your environment or infrastructure. From the output of the check script(s), you can determine what issue was found (if an issue is found), and what possible action should be taken, to remediate the issue.

Note 3: The default shell for AIX systems is the Korn Shell (short: ksh), and all check scripts of the AIX version of UNIX Health Check use the extension "ksh". The default shell for Red Hat Enterprise Linux is the Bourne Again Shell, or Bash, and all check scripts of the Red Hat Enterprise Linux version of UNIX Health check use extension "sh".


Return codes

Each script will exit with a return code, which is either zero meaning the script completed successfully, or one meaning an error has been encountered, or two, a warning situation occurred:

0
The script completed successfully.
1
The script found an ERROR.
2
The script ended with a WARNING.

For example, script checktmpsize.ksh (for AIX) or checktmpsize.sh (for Red Hat Enterprise Linux) will check if file system /tmp is at least 1 GB in size:
root@(testaix1) /uhc # grep -i purpose checktmpsize.ksh
# Purpose:     Check if the size of /tmp is at least 1 GB.
root@(testaix1) /uhc # ./checktmpsize.ksh
root@(testaix1) /uhc # echo $?
0
root@(testaix1) /uhc # df -g /tmp
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd3           2.00      1.99    1%       44     1% /tmp
As you can see in the example above, file system /tmp is indeed at least 1 GB, in fact it is 2 GB, which meets the best practice for sizing /tmp. By having a /tmp file system that is large enough, it is unlikely to fill it up any time soon. The script therefore returns a zero, meaning, it completed successfully.

Here's an example of when a check script returns an error code (return code 1):

Script checkpgspminsize.ksh (for AIX) is used to make sure that the paging space(s) defined is/are at least the same size as the available memory in the system:
root@(testaix1) /uhc # grep -i purpose checkpgspminsize.ksh
# Purpose: Check if paging space is at least the same size as memory.
root@(testaix1) /uhc # ./checkpgspminsize.ksh
Paging space smaller than memory, requires additional 12288 MB.
root@(testaix1) /uhc # echo $?
1
root@(testaix1) /uhc # lsattr -El mem0 -a goodsize
goodsize 16384 Amount of usable physical memory in Mbytes False
root@(testaix1) /uhc # lsps -s
Total Paging Space   Percent Used
      4096MB               1%
As you can see in the example above, the script returns an error, return code 1, because the system has 16,384 MB (or 16 GB) of memory, and only 4,096 MB (or 4 GB) is assigned to the paging space(s), meaning it lacks 12,228 MB of paging space (because the best practice is for AIX systems with less than 32 GB of memory to at least have a paging space that has the same size as the available memory).


Running several or all checks

Running all the check scripts individually can be cumbersome. UNIX Health Check includes hundreds and hundreds of check scripts. Therefore, there's also a checkall.ksh (for AIX) or checkall.sh (for Red Hat Enterprise Linux) script available, which can be used to run several or all the check scripts. The checkall.ksh or checkall.sh script will not run all scripts at once, but one at a time. It does however, perform that very quickly.

For example, to run all scripts, and to produce a log file within the same directory on a Red Hat Enterprise Linux system, run:
root@(testrhel1) /uhc # ./checkall.sh
You will not see any output on the screen when you run the command shown above, because that is the default behavior of UNIX Health Check (Note: Using the "-v" option" you can have UNIX Health Check display verbose information while it is running). A log file however is produced in the same folder, with the extension ".log":
root@(testrhel1) /uhc # ls *log
checkall_testrhel1.log
You can review the log file for each script run, for example on an AIX system, for script checkexcluderootvg.ksh, which makes sure that at least the /tmp file is excluded from a mksysb backup, you may find the following entry in the log file:
Running check 78 of 378: checkexcluderootvg.ksh

^./tmp

Check checkexcluderootvg.ksh completed successfully: return code 0
20% complete - 300 checks to go.
At the end of the log file, a summary is included:
Finished checking host testrhel1.

Run time for all checks              : 319 seconds
Total number of checks               : 478
# Checks with result OK              : 437
# Checks with result WARNING         : 29
# Checks with result ERROR           : 12
Score [Percentage OK]                : 91.40 %

For details see logfile              : checkall_testrhel1.log


Scoring the system

As you can see in the example above, the Red hat Enterprise Linux system testrhel1 receives a score based on the output of all scripts. In general, a UNIX system that has well been taken care of, should receive a score of 95% or higher. Any score lower than 95% is an indication that there are issues to be remediated on the system.


Other useful functions

UNIX Health Check is a great tool for discovering possible performance bottle necks, because the log file produced by UNIX Health Check includes all kinds of performance metrics. For example, it lists the top 20 memory using processes:
Running check 332 of 378: checktop20memoryusers.ksh

     Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit 
 3997924 java             67395     8771        0    41672      N
 1966558 cimserver        29036     8687        0    29023      N
 2883854 rmcd             24620     8667        0    24337      N
 2556206 clstrmgr         24462     8660        0    22899      N
 5767338 topas_nmon       24042     8660        0    23338      N
  655452 cimlistener      23651     8663        0    23637      N
 7340286 sshd             23472     8660        0    22934      N
 3735800 sshd             23378     8660        0    22842      N
 2359760 sendmail         23333     8660        0    22945      N
 5177352 IBM.CSMAgentR    23321     8675        0    22655      N
 1835476 tier1slp         23178     8660        0    23160      N
 3145776 clcomd           23156     8663        0    22925      N
 2228672 snmpdv3ne        23043     8660        0    22952      N
 3080464 clcomd           22974     8662        0    22783      N
 2687464 cron             22932     8660        0    22898      N
 3342612 xntpd            22872     8660        0    22795      N
 4194544 diagd            22846     8660        0    22834      N
 4063460 nonstop_aix      22728     8662        0    22700      N
 2163140 slp_srvreg       22727     8660        0    22720      N
 4915212 IBM.ServiceRM    22473     8676        0    22386      N

Check checktop20memoryusers.ksh completed successfully: return code 0
87% complete - 46 checks to go.
Also, UNIX Health Check is a great tool for gathering inventory information. Very useful during Disaster Recovery (exercises) or doing a Quick Scan of a new UNIX environment. For example, UNIX Health Check for AIX will generate a list of commands needed to re-create all the logical volumes, file systems and the correct permissions to be set, in case a server needs to be recovered:
root@(testaix1) /uhc # checklvfscreate.ksh
mklv -e x -y exportlv -t jfs2 nimvg 40960M
chlv -U root -G system -P 660 exportlv
mklv -e x -y sysadmlv -t jfs2 nimvg 204800M
chlv -U root -G system -P 660 sysadmlv
crfs -v jfs2 -m /export -d exportlv -a logname=INLINE -A yes 
crfs -v jfs2 -m /sysadm -d sysadmlv -a logname=INLINE -A yes
mkdir -p /export 2>/dev/null
mkdir -p /sysadm 2>/dev/null
mount /export 2>/dev/null
mount /sysadm 2>/dev/null
chmod 755 /export
chown root:system /export
chmod 755 /sysadm
chown root:system /sysadm
If you'd rather run some scripts, and not all of them, you can use the -s option. For example, to run two check scripts on a Red Hat Enterprise Linux system, checkmodelname.sh and checkcpumodel.sh, run:
root@(testrhel1) /uhc # ./checkall.sh -v -s checkcpumodel.sh,checkmodelname.sh
The -v option will produce verbose output on the screen. And the -s option is used to specify to run specific check scripts (separated by a comma).

The output of the command above will look like this:
##########################################################################

2017-04-26 22:57:12: UNIX HEALTH CHECK FOR RED HAT ENTERPRISE LINUX

Copyright (c) 2004-2017 UNIX Health Check - All Rights Reserved

www.unixhealthcheck.com

This is confidential and unpublished work of authorship subject to
limited use license agreements and is a trade secret, which is the
property of UNIX Health Check (www.unixhealthcheck.com). All use,
disclosure and/or reproduction not specifically authorized in writing by
UNIX Health Check is strictly prohibited.

Any expressed or implied warranties are disclaimed. In no event shall
UNIX Health Check be liable for any direct, indirect, incidental,
special, exemplary, or consequential damages (including, but not limited
to, loss of use, data, profits, or business interruption) however caused
and on any theory of liability, whether in contract, strict liability, or
tort (including negligence or otherwise) arising in any way out of the
use of these scripts, even if advised of the possibility of such damage.

This report is generated by a demo version of UNIX Health Check
for Red Hat Enterprise Linux. It is an overview of check scripts run on
a Red Hat system, and depending on the options selected when the
checkall.sh script was run, it will list each check script, the
returncode of the check script, the output of the check script and a
description. At the end of this report is an overview of all scripts run
and a score report.

Any individual implementing changes should completely understand the
change and deem each change appropriate before it is applied to the
system. As a standard rule, please take into consideration the impact on
other components before implementing the change. Also, we encourage all
to conduct a peer review of all changes before implementation. Most
importantly, if the effect of a change is not fully understood, do not
implement that change until a satisfactory explanation can be given as to
what the effects of the change are. We recommend implementation of one
change at a time. The application of many changes all at once will
increase the likelihood of confusion, if issues arise.

For more information, check website https://www.unixhealthcheck.com.
For support, email to support@unixhealthcheck.com.

##########################################################################

2017-04-26 22:57:12: OPTIONS SELECTED
-------------------------------------

2017-04-26 22:57:12: Version:         17.04.26
2017-04-26 22:57:12: Start at:        04/26/2017 22:57:11 CDT
2017-04-26 22:57:12: Options:         -v -s
checkcpumodel.sh,checkmodelname.sh
2017-04-26 22:57:12: Output file:     checkall_testrhel1.log
2017-04-26 22:57:12: Width:           74
2017-04-26 22:57:12: Display:         All checks
2017-04-26 22:57:12: Descriptions:    No
2017-04-26 22:57:12: Output type:     TEXT
2017-04-26 22:57:12: # Checks:        2
2017-04-26 22:57:12: Scripts:         checkcpumodel.sh checkmodelname.sh

##########################################################################

2017-04-26 22:57:12: SYSTEM CONFIGURATION
-----------------------------------------

2017-04-26 22:57:12: Hostname:        testrhel1
2017-04-26 22:57:12: IP Address:      192.168.0.206 on interface em1
2017-04-26 22:57:12: IP Assignment:   Static
2017-04-26 22:57:12: Subnet Mask:     255.255.255.0
2017-04-26 22:57:12: Default Gateway: 192.168.0.1
2017-04-26 22:57:12: Name Server(s):  192.168.0.206 8.8.8.8
2017-04-26 22:57:12: OS Level:        CentOS Linux release 7.3.1611
(Core)
2017-04-26 22:57:12: Model:           Dell Inc. PowerEdge R820
2017-04-26 22:57:12: Serial Number:   8R349W1
2017-04-26 22:57:12: Kernel:          64 bit
2017-04-26 22:57:12: Processor Type:  Intel(R) Core(TM) i7-3320M CPU @
2.60GHz
2017-04-26 22:57:12: # Sockets:       1
2017-04-26 22:57:12: # Cores/socket:  2
2017-04-26 22:57:12: # Cores:         2
2017-04-26 22:57:12: # Threads/core:  2
2017-04-26 22:57:12: Hyper-Threading: Enabled
2017-04-26 22:57:12: CPUs:            4
2017-04-26 22:57:12: Memory:          4096 MB
2017-04-26 22:57:12: Paging Space:    3967 (54% in use)
2017-04-26 22:57:12: Uptime:          22:57:12 up 61 days, 23:15, 2
users, load average: 0.17, 0.18, 0.14

##########################################################################

2017-04-26 22:57:12: CHECK SCRIPTS
----------------------------------

--------------------------------------------------------------------------
2017-04-26 22:57:12: Running check script 1 of 2: checkcpumodel.sh

Output:
-------

Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz

2017-04-26 22:57:12: Check checkcpumodel.sh completed successfully:
returncode 0
2017-04-26 22:57:12: Runtime: 0 second(s)
2017-04-26 22:57:12: 50% complete - 1 checks to go.
--------------------------------------------------------------------------
2017-04-26 22:57:12: Running check script 2 of 2: checkmodelname.sh

Output:
-------

Dell Inc. Latitude PowerEdge R820

2017-04-26 22:57:12: Check checkmodelname.sh completed successfully:
returncode 0
2017-04-26 22:57:12: Runtime: 0 second(s)
2017-04-26 22:57:12: 100% complete - 0 checks to go.
--------------------------------------------------------------------------

##########################################################################

2017-04-26 22:57:12: RESULTS
----------------------------

2017-04-26 22:57:12: Run time for all checks              : 1 second
2017-04-26 22:57:12: Total number of checks               : 2
2017-04-26 22:57:12: # Checks with result OK              : 2
2017-04-26 22:57:12: # Checks with result WARNING         : 0
2017-04-26 22:57:12: # Checks with result ERROR           : 0
2017-04-26 22:57:12: Score [Percentage OK/WARNING]        : 100 %

2017-04-26 22:57:12: For details see logfile              :
/mnt/rhc/scripts/checkall_testrhel1.log

##########################################################################


Adding descriptions to checks

If you'd like to include descriptions to the output, add the -d option. Descriptions of each check can be found in file DESCRIPTIONS, which you can view with an editor, however, the -d option will add the description to each specific script automatically:
root@(testaix1) /uhc # checkall.ksh -v -s checkwget.ksh -d
The output of the command above will look like this:
Running check 1 of 1: checkwget.ksh

Description:
------------

Check if wget is installed, and if so, if the correct version is
installed. The latest available version in the AIX Toolbox for
Linux Applications is version 1.9.1.

Output:
------------

wget-1.9.1-1

Check checkwget.ksh completed successfully: returncode 0
100% complete - 0 checks to go.


Sending output through email

Using the -m option, you can select to email the output to an email address:
root@(testaix1) /uhc # checkall.ksh -v -s checkwget.ksh -d -m my@email.com
Note: Please make sure to specify a valid email address to send the report to.


Comma separated output

And using the -c option, you can choose to produce CSV style output instead of the default log file output:
root@(testaix1) /uhc # checkall.ksh -cs checkwget.ksh,checkuptime.ksh
root@(testaix1) /uhc # cat *csv
Hostname,Date-Time,Check,Returncode,Output
testaix1,2011-04-21 22:25:48,checkuptime.ksh,0,
testaix1,2011-04-21 22:25:48,checkwget.ksh,0,wget-1.9.1-1
Selecting CSV style (or Comma Separated) output, can be very useful for loading the output of scripts into a database.


HTML output

The -h option can be used to create an HTML style output:
root@(testaix1) /uhc # checkall.ksh -hs checkwget.ksh,checkuptime.ksh
root@(testaix1) /uhc # ls *html
checkall_testaix1.html
The HTML file created will look like this:


As you can see, colors are used to indicate the return code of the scripts that have been run. For a full HTML sample report (including other sample reports), click the following link:
Other options

Many other options exist as well. For example, the -l option can be used to specify the location of the output file. The -w option can be used to determine the width of the output, especially useful when creating log file output. And the -g option can be used to suppress the output of all the successful checks, resulting in a report of only those scripts that generated a Warning or an Error.

The different options can all be combined. You can create a CSV style output (-c option) and have it emailed to you (-m option). You can generate an HTML style report (-h option), have it emailed to you (-m option), only showing the non-successful checks (-g option), with added descriptions (-d option).

For a detailed overview of all available options, please see the documentation of UNIX Health Check:
Conclusion

As you can see, UNIX Health Check is a very versatile and at the same time, a very easy to use tool. It's very intuitive, and doesn't require a long time to get used to. Use it in any way you see fit, with those options you prefer.