UNIX Health Check

Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: AIX, PowerHA / HACMP, System Admin

Mountguard

IBM has implemented a new feature implemented for JFS2 filesystems to prevent simultaneous mounting within PowerHA clusters.

While PowerHA can give concurrent access of volume groups to multiple systems, mounting a JFS2 filesystem on multiple nodes simultaneously will cause filesystem corruption. These simultaneous mount events can also cause a system crash, when the system detects a conflict between data or metadata in the filesystem and the in-memory state of the filesystem. The only exception to this is mounting the filesystem read-only, where files or directories can't be changed.

In AIX 7100-01 and 6100-07 a new feature called "Mount Guard" has been added to prevent simultaneous or concurrent mounts. If a filesystem appears to be mounted on another server, and the feature is enabled, AIX will prevent mounting on any other server. Mount Guard is not enabled by default, but is configurable by the system administrator. The option is not allowed to be set on base OS filesystems such as /, /usr, /var etc.

To turn on Mount Guard on a filesystem you can permanently enable it via /usr/sbin/chfs:

# chfs -a mountguard=yes /mountpoint
/mountpoint is now guarded against concurrent mounts.

The same option is used with crfs when creating a filesystem.

To turn off mount guard:

# chfs -a mountguard=no /mountpoint
/mountpoint is no longer guarded against concurrent mounts.

To determine the mount guard state of a filesystem:

# lsfs -q /mountpoint
Name      Nodename Mount Pt    VFS  Size    Options Auto Accounting
/dev/fslv --       /mountpoint jfs2 4194304 rw      no   no
  (lv size: 4194304, fs size: 4194304, block size: 4096, sparse files: yes,
  inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: 
  no, VIX: yes, EFS: no, ISNAPSHOT: no, MAXEXT: 0, MountGuard: yes)

The /usr/sbin/mount command will not show the mount guard state.

When a filesystem is protected against concurrent mounting, and a second mount attempt is made you will see this error:

# mount /mountpoint
mount: /dev/fslv on /mountpoint:
Cannot mount guarded filesystem.
The filesystem is potentially mounted on another node

After a system crash the filesystem may still have mount flags enabled and refuse to be mounted. In this case the guard state can be temporarily overridden by the "noguard" option to the mount command:

# mount -o noguard /mountpoint
mount: /dev/fslv on /mountpoint:
Mount guard override for filesystem.
The filesystem is potentially mounted on another node.

Reference: http://www-01.ibm.com/support/docview.wss?uid=isg3T1018853

Topics: Red Hat / Linux, System Admin ↑

Watch

On Linux, you can use the watch command to run a specific command repeatedly, and monitor the output.

Watch is a command-line tool, part of the Linux procps and procps-ng packages, that runs the specified command repeatedly and displays the results on standard output so you can watch it change over time. You may need to encase the command in quotes for it to run correctly.

For example, you can run:

# watch "ps -ef | grep bash"

The "-d" argument can be used to highlight the differences between each iteration, for example to highlight the time changes in the ntptime command:

# watch -d ntptime

By default, the command is run every two seconds, although this is adjustable with the "-n" argument. For example, to run the uptime command every second:

# watch -n 1 uptime

Topics: Networking, System Admin ↑

Measuring network throughput with Iperf

Iperf is a command-line tool that can be used to diagnose network speed related issues, or just simply determine the available network throughput.

Iperf measures the maximum network throughput a server can handle. It is particularly useful when experiencing network speed issues, as you can use Iperf to determine what the maximum throughput is for a server.

First, you'll need to install iperf.

For AIX:

Iperf is available from http://www.perzl.org/aix/index.php?n=Main.iperf. Download the RPM file, for example iperf-2.0.9-1.aix5.1.ppc.rpm to your AIX system. Next install it:

# rpm -ihv iperf-2.0.9-1.aix5.1.ppc.rpm

For Red Hat Enterprise Linux:

You'll first need to install EPEL, as Iperf is not available in the standard Red Hat repositories. For example for Red Hat 7 systems:

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Next, you'll have to install Iperf itself:

# yum -y install iperf

Now that you have Iperf installed, you can start testing the connection between two servers. So, you'll need to have at least two servers with Iperf installed.

On the server you wish to test, launch Iperf in server mode:

# iperf -s

That will the server in listening mode, and besides that, nothing happens. The output will look something like this:

# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  4] local 198.51.100.5 port 5001 connected with 198.51.100.6 port 59700
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  9.76 GBytes  8.38 Gbits/sec

On the other server, connect to the first server. For example, if your first server is at IP address 198.51.100.5, run:

# iperf -c 198.51.100.5

After about 10 seconds, you'll see output on your screen showing the amount of data transferred, and the available bandwidth. The output may look something like this:

#  iperf -c 198.51.100.5
------------------------------------------------------------
Client connecting to 198.51.100.5, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 198.51.100.6 port 59700 connected with 198.51.100.5 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  9.76 GBytes  8.38 Gbits/sec

You can run multiple tests while the server Iperf process is listening on the first server. When you've completed your test, you can CTRL-C the running server Iperf command.

For more information, see the official Iperf site at iperf.fr.

Topics: AIX, Installation, NIM ↑

Creating an LPP source and SPOT in NIM

This is a quick and dirty method of setting up an LPP source and SPOT of AIX 5.3 TL10 SP2, without having to swap DVD's into the AIX host machine. What you basically need is the actual AIX 5.3 TL10 SP2 DVD's from IBM, a Windows host, and access to your NIM server. This process basically works for every AIX level, and has been tested with versions up to AIX 7.2.

If you have actual AIX DVD's that IBM sent to you, create ISO images of the DVD's through Windows, e.g. by using MagicISO. Or, go to Entitled Software Support and download the ISO images there.

SCP these ISO image files over to the AIX NIM server, e.g. by using WinSCP.

We need a way to access the data in the ISO images on the NIM server, and to extract the filesets from it (see IBM Wiki).

For AIX 5 systems and older:

Create a logical volume that is big enough to hold the data of one DVD. Check with "lsvg rootvg" if you have enough space in rootvg and what the PP size is. In our example it is 64 MB. Thus, to hold an ISO image of roughly 4.7 GB, we would need roughly 80 LPs of 64 MB.

# /usr/sbin/mklv -y testiso -t jfs rootvg 80

Create filesystem on it:

# /usr/sbin/crfs -v jfs -d testiso -m /testiso -An -pro -tn -a frag=4096 -a nbpi=4096 -a ag=8

Create a location where to store all of the AIX filesets on the server:

# mkdir /sw_depot/5300-10-02-0943-full

Copy the ISO image to the logical volume:

# /usr/bin/dd if=/tmp/aix53-tl10-sp2-dvd1.iso of=/dev/rtestiso bs=1m
# chfs -a vfs=cdrfs /testiso

Mount the testiso filesystem and copy the data:

# mount /testiso
# bffcreate -d /testiso -t /sw_depot/5300-10-02-0943-full all
# umount /testiso

Repeat the above 5 steps for both DVD's. You'll end up with a folder of at least 4 GB.

Delete the iso logical volume:

# rmfs -r /testiso
# rmlv testiso

When you're using AIX 7 / AIX 6.1:

Significant changes have been made in AIX 7 and AIX 6.1 that add new support for NIM. In particular there is now the capability to use the loopmount command to mount iso images into filesystems. As an example:

# loopmount -i aixv7-base.iso -m /aix -o "-V cdrfs -o ro"

The above mounts the AIX 7 base iso as a filesystem called /aix.

So instead of going through the trouble of creating a logical volume, creating a file system, copying the ISO image to the logical volume, and mounting it (which is what you would have done on AIX 5 and before), you can do all of this with a single loopmount command.

Make sure to delete any left-over ISO images:

# rm -rf /tmp/aix53-tl10-sp2-dvd*iso

Define the LPP source (From the NIM A to Z redbook):

# mkdir /export/lpp_source/LPPaix53tl10sp2
# nim -o define -t lpp_source -a server=master -a location=/export/lpp_source/LPPaix53tl10sp2 -a source=/sw_depot/5300-10-02-0943-full LPPaix53tl10sp2

Check with:

# lsnim -l LPPaix53tl10sp2

Rebuild the .toc:

# nim -Fo check LPPaix53tl10sp2

For newer AIX releases, e.g. AIX 7.1 and AIX 7.2, you may get a warning like:

Warning: 0042-354 c_mk_lpp_source: The lpp_source is missing a
    bos.vendor.profile which is needed for the simages attribute. To add
    a bos.vendor.profile to the lpp_source run the "update" operation
    with "-a recover=yes" and specify a "source" that contains a
    bos.vendor.profile such as the installation CD.  If your master is not
    at level 5.2.0.0 or higher, then manually copy the bos.vendor.profile
    into the installp/ppc directory of the lpp_source.

If this happens, you can either do exactly what it says, copy the installp/ppc/bos.vendor.profile file from your source DVD ISO image into the installp/ppc directory of the LPP source. Or, you can remove the entire LPP source, then copy the installp/ppc/bos.vendor.profile form the DVD ISO image into the directory that contains the full AIX software set (in the example above: /sw_depot/5300-10-02-0943-full), and then re-create the LPP source. That should help to avoid the warning.

If you ignore this warning, then you'll notice that the next step (create a SPOT from the LPP source) will fail.

Define a SPOT from the LPP source:

# nim -o define -t spot -a server=master -a location=/export/spot/SPOTaix53tl10sp2 -a source=LPPaix53tl10sp2 -a installp_flags=-aQg SPOTaix53tl10sp2

Check the SPOT:

# nim -o check SPOTaix53tl10sp2
# nim -o lppchk -a show_progress=yes SPOTaix53tl10sp2

Topics: Backup & restore, Spectrum Protect ↑

TSM filespaces clean-up

Within TSM (or nowadays known as IBM Spectrum Protect), filespaces may exist that are no longer backed up. These are file systems that were once backed up, but are no longer backed anymore.

This may occur if someone deletes a file system from a client, and thus it is no longer backed up. Or a file system was added to the exclude list, so it's no longer included in any backup runs.

These old filespaces may use up quite some storage, and because they're never being backed up anymore, their data remains on the TSM server for restore purposes.

It's good practice to review these filespaces and to determine if they can be deleted from TSM to free up storage space. And thus it's a good idea to put this in a script, and have that script run from time to time automatically, for example by scheduling it in the crontab on a weekly or monthly basis.

Here's a sample script. You can run it on your AIX TSM server. It assumes you have a "readonly" user account configured within TSM, with a password of "readonly". It will send out an email if any obsolete filespace are present. You have to update the email variable at the beginning of the script to whatever email address you want to send an email.

#!/bin/ksh

email="my@emailaddress.com"
y=$(perl -MPOSIX -le 'print strftime "%D",localtime(time-(60*60*24))')
mytempfile=/tmp/myadmintempfile.$$
rm -f ${mytempfile}
dsmadmc -comma -id=readonly -password=readonly q filespace \* \* f=d | \
grep -v $(date +"%m/%d/%y") | grep -v "${y}" | grep ",," > ${mytempfile}
if [ -s ${mytempfile} ] ; then
cat ${mytempfile} | mailx -s "Filespaces not backed up during last 24 \
hours." ${email} >/dev/null 2>&1
fi
rm -f ${mytempfile}

exit 0

The script will send an email with a list of any filespaces not backed in the last 24 hours, if any are found.

The next thing you'll have to do is to investigate why the file system is not backed up. If you've determined that the filespace is no longer required in TSM, then you can move forward by deleting the filespace from TSM.

For example, for a UNIX file system:

delete filespace hostname /file/system

For file systems of Windows clients, deleting a filespace may be a bit more challenging. TSM might not allow you to remove a filespace and exit with an error message like "No matching file space found".

You might attempt to delete the filespace like this:

delete filespace nodename \\nodename\d$ nametype=uni

Or, you can remove Windows filespaces in TSM by filespace number. In that case, first list the known filespaces in TSM of a specific nodename, for example:

q filespace nodename *

This will list all the filespaces known in TSM for host "nodename". Replace nodename with whatever client you're searching for. In the output you'll see the hostanme, the filespace name, and a filespace number, for example, 1.

To delete filespace with filespace number "1" for host "nodename", you can run:

delete filespace nodename 1 nametype=fsid

Topics: Backup & restore, Spectrum Protect ↑

Check Spectrum Protect / TSM backups

An easy command to check on the Spectrum Protect / TSM server what the backup status of all the Spectrum Protect / TSM clients is using the "q event" command. For example:

q event * * begind=-1 begint=09:00 endd=today endt=09:00

The command above will display the status of all the backups jobs in the last 24 hours between 9 AM yesterday and 9 AM today.

Topics: ↑

Useful IBM Spectrum Protect / TSM scripts

Here are few useful scripts that can be defined on an IBM Spectrum Protect / TSM environment.

Script: DB_BACKUP

This script will perform all the tasks of the database backup that should be run on a daily basis. We suggest scheduling this script as an administrative schedule to run every morning, after all the backups have completed, an no other processes (such as reclamation or expiration) are active.

define script db_backup description="Database backup"
update script db_backup "set dbrecovery dbbkp"
update script db_backup "del volh tod=-3 t=dbb"
update script db_backup "backup db t=f devc=dbbkp nums=4"
update script db_backup "backup volhist"
update script db_backup "backup devconfig"

Script: DB_UTILIZED

Script DB_UTILIZED can be used to display the database and log utilization. It basically runs a few select statements on both the db and log tables in the database, but makes it easier to view the results.

define script db_utilized description="Display database and log utilization"
update script db_utilized "select sum(100-(free_space_mb*100) / tot_file_system_mb) as DB_PCT_UTILIZED from db"
update script db_utilized "select sum(cast(100-(FREE_SPACE_MB*100) / TOTAL_SPACE_MB as decimal(5,2))) as ACTIVE_LOG_PCT_UTILIZED from log"
update script db_utilized "select sum(cast(100-(ARCHLOG_FREE_FS_MB*100) / ARCHLOG_TOL_FS_MB as decimal(5,2))) as ARCHIVE_LOG_PCT_UTILIZED from log"
update script db_utilized "select sum(cast(100-(AFAILOVER_FREE_FS_MB*100) / AFAILOVER_TOL_FS_MB as decimal(5,2))) as FAILOVER_LOG_PTC_UTILIZED from log"

Script: MOUNTS

Script MOUNTS will display the active volume mounts on the system.

define script mounts description="Display volume mounts"
update script mounts "select client_name,session_id,INPUT_VOL_ACCESS,OUTPUT_VOL_ACCESS from sessions where INPUT_VOL_ACCESS>' ' or OUTPUT_VOL_ACCESS>' '

Script: T_PCT

Script T_PCT will display the volumes to be reclaimed higher than a given percent. It will both show a list of all the volumes, and it will display a total number of volumes at the end. When running this script, both a percentage and a storage pool name should be provided, for example:

RUN T_PCT 70 FILE_DISK

In the example above, all volumes in storage pool FILE_DISK that can be reclaimed having a pct_reclaim higher than 70% will be displayed.

The script is defined as follows:

define script t_pct description="Display reclaimable volumes at given percent"
update script t_pct "select substr(volume_name,1,12) as VOLUME_NAME, substr(stgpool_name,1,20) as STGPOOL_NAME, substr(status,1,15) as STATUS, substr(access,1,15) as ACCESS, pct_reclaim from volumes where (pct_reclaim >=$1 and lower(stgpool_name)='$2') or (pct_reclaim >=$1 and stgpool_name='$2') order by pct_reclaim desc"
update script t_pct "select count(*) as Total from volumes where (pct_reclaim>=$1 and lower(stgpool_name)='$2') or (pct_reclaim>=$1 and stgpool_name='$2')

Topics: Backup & restore, Spectrum Protect ↑

Spectrum Protect / TSM: Display deduplicaton bytes pending removal

There are numerous show commands available for IBM Spectrum Protect / TSM, that will display information about the environment. Many of them aren't as well documented, probably because IBM intends to use these commands for their own support.

Quite a lot of these commands have been documented by Spectrum Protec / TSM users, and an example can be found on the following web site: http://www.mm-it.at/de/TSM_Show_Commands.html.

A very interesting show command, that can be used to display the amount of deduplicate bytes pending removal, is the following command:

tsm: TSM>show deduppending file_disk
ANR1015I Storage pool FILE_DISK has 7,733,543,532,121 duplicate bytes pending removal.

The command above shows the number of byes for storage pool "FILE_DISK" still to be removed by the dedupe processes.

The command may take quite some time to run, up to 10 minutes, so please be patient when issuing this command.

Topics: LVM, Red Hat / Linux, Storage ↑

Logical volume snapshot on Linux

Creating a snapshot of a logical volume, is an easy way to create a point-in-time backup of a file system, while still allowing changes to occur to the file system. Basically, by creating a snapshot, you will get a frozen (snapshot) file system that can be backed up without having to worry about any changes to the file system.

Many applications these days allow for options to "freeze" and "thaw" the application (as in, telling the application to not make any changes to the file system while frozen, and also telling it to continue normal operations when thawed). This functionality of an application can be really useful for creating snapshot backups. One can freeze the application, create a snapshot file system (literally in just seconds), and thaw the application again, allowing the application to continue. Then, the snapshot can be backed up, and once the backup has been completed, the snapshot can be removed.

Let's give this a try.

In the following process, we'll create a file system /original, using a logical volume called originallv, in volume group "extern". We'll keep it relatively small (just 1 Gigabyte - or 1G), as it is just a test:

# lvcreate -L 1G -n originallv extern
  Logical volume "originallv" created.

Next, we'll create a file system of type XFS on it, and we'll mount it.

# mkfs.xfs /dev/mapper/extern-originallv
# mkdir /original
# mount /dev/mapper/extern-originallv /original
# df -h | grep original
/dev/mapper/extern-originallv 1014M   33M  982M   4% /original

At this point, we have a file system /original available, and we can start creating a snapshot of it. For the purpose of testing, first, create a couple of files in the /original file system:

# touch /original/file1 /original/file2 /original/file3
# ls /original
file1  file2  file3

Creating a snapshot of a logical volume is done using the "-s" option of lvcreate:

# lvcreate -s -L 1G -n originalsnapshotlv /dev/mapper/extern-originallv

In the command example above, a size of 1 GB is specified (-L 1G). The snapshot logical volume doesn't have to be the same size as the original logical volume. The snapshot logical volume only needs to hold any changes to the original logical volume while the snapshot logical volume exists. So, if there are very little changes to the original logical volume, the snapshot logical volume can be quite small. It's not uncommon for the snapshot logical volume to be just 10% of the size of the original logical volume. If there are a lot of changes to the original logical volume, while the snapshot logical volume exists, you may need to specify a larger logical volume size. Please note that large databases, in which lots of changes are being made, are generally not good candidates for snapshot-style backups. You'll probably have to test in your environment if it will work for your application, and to determine what a good size will be of the snapshot logical volume.

The name of the snapshot logical volume in the command example above is set to originalsnapshotlv, using the -n option. And "/dev/mapper/extern-originallv" is specified to indicate what the device name is of the original logical volume.

We can now mount the snapshot:

# mkdir /snapshot
# mount -o nouuid /dev/mapper/extern-originalsnapshotlv /snapshot
# df -h | grep snapshot
/dev/mapper/extern-originalsnapshotlv 1014M   33M  982M   4% /snapshot

And at this point, we can see the same files in the /snapshot folder, as in the /original folder:

# ls /snapshot
file1  file2  file3

To prove that the /snapshot file system remains untouched, even when the /original file system is being changed, let's create a file in the /original file system:

# touch /original/file4
# ls /original
file1  file2  file3  file4
# ls /snapshot
file1  file2  file3

As you can see, the /original file system now holds 4 files, while the /snapshot file system only holds the original 3 files. The snapshot file system remains untouched.

To remove the snapshot, a simple umount and lvremove will do:

# umount /snapshot
# lvremove -y /dev/mapper/extern-originalsnapshotlv

So, if you want to run backups of your file systems, while ensuring no changes are being made, here's the logical order of steps that can be scripted:

Freeze the application
Create the snapshot (lvcreate -s ...)
Thaw the application
Mount the snapshot (mkdir ... ; mount ...)
Run the backup of the snapshot file system
Remove the snapshot (umount ... ; lvremove ... ; rmdir ...)

Topics: Red Hat / Linux, Virtualization ↑

Renaming a virtual machine domain with virsh

There is no API to accomplish renaming a domain (or system) using virsh. The well known graphical tool "virt-manager" (or "Virtual Machine Manager") on Red Hat Enterprise Linux therefore also does not offer the possibility to rename a domain.

In order to do that, you have to stop the virtual machine and edit the XML file as follows:

# virsh dumpxml machine.example.com > machine.xml
# vi machine.xml

Edit the name between the name tags at the beginning of the XML file.

When completed, remove the domain and define it again:

# virsh undefine machine.example.com
# virsh define machine.xml

Number of results found: 470.
Displaying results: 71 - 80.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!