UNIX Health Check

Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: Backup & restore, Spectrum Protect

Raw logical volumes vs JFS file systems for TSM

When using TSM on AIX, JFS file systems for storage of the database volumes are preferred, and not RAW logical volumes. This subject usually tends to cause discussion, so here are the reasons for using JFS and not use RAW logical volumes:

When TSM has JFS files open, they are locked by JFS and other applications cannot write to them. However, raw logical volumes are not locked and any application can write to them. TSM tries to prevent starting more than one instance of the same server from the same directory, but it can be done. If you are using raw logical volumes, multiple server instances can simultaneously update the same information. This could cause errors in the database, recovery log, or storage pool raw logical volumes. Auditing a corrupted TSM database and fixing corruptions can take up to a day downtime.

After a database, recovery log, or storage pool volume is defined to TSM, you cannot change its size. TSM uses size information to determine where data is placed and whether volumes have been modified by other applications or utilities. However, if you use raw logical volumes, smit lets you increase their sizes. If the volume is defined to TSM before its size is increased, TSM cannot use the volume or its data.

The use of JFS file systems for database, recovery log, and storage pool volumes requires slightly more CPU than is required for raw volumes. However, JFS read-ahead caching improves performance. Lab tests have proven that the use of raw logical volumes tend to give better performance. So as long as you have enough CPU, you may still use JFS.

Topics: Backup & restore, Spectrum Protect ↑

TSM Database space assignment

If you have a TSM database spread across mutliple database volumes on disk, be very careful on how you use them. It is best to assign multiple database volumes on different disks, all equal in size. This way, TSM is able to use the performance of several disks equally.

Also, take a look at the assigned capacity of the TSM database. It might well be that not the full capacity of the database is assigned to TSM; this way, some database volumes are more heavily used than other. A good command to check if the database volumes are equally used is "q dbspace":

q dbspace
q db f=d

These commands can be used to find out, how your database capacity is assigned. If it isn't fully assigned, then do it now.

Check the output of the TSM commands with the location of your database volumes on the operating system, to see if the database volumes are equally spread across multiple disks. The AIX commands iostat and vmstat will give you a good idea if your disks are used equally.

Topics: Backup & restore, Oracle, Spectrum Protect ↑

Test Oracle TDP

How do you test if Oracle TDP (RMAN) is working properly?

# tdpoconf showenv

Topics: Backup & restore, Spectrum Protect ↑

Excluding a directory from the TSM backup

If you need to exclude a specific file system from the TSM backup, than you would add the following line to the dsm.sys file:

DOMAIN ALL-LOCAL -/opt/archive

This examples will avoid backing up file system /opt/archive.

Now, what if you wish to exclude a certain directory within a file system from the backup:

Create the following enty in the dsm.sys file:

INCLExcl /usr/tivoli/tsm/client/ba/bin/inclexcl

Then create the inclexcl file and add the following line:

Exclude.dir /opt/archive/tmp

This will only exclude the tmp folder in file system /opt/archive.

You can check with the following command:

# dsmc q inclexcl

Topics: PowerHA / HACMP ↑

PowerHA / HACMP Introduction

PowerHA is the new name for HACMP, which is short for High Availability Cluster Multi-Processing, a product of IBM. PowerHA / HACMP runs on AIX (and also on Linux) and its purpose is to provide high availability to systems, mainly for hardware failures. It can automatically detect system or network failures and can provide the capability to recover system hardware, applications, data and users while keeping recovery time to an absolute minimum. This is useful for systems that need to be online 24 hours a day, 365 days per year; for organizations that can't afford to have systems down for longer than 15 minutes. It's not completely fault-tolerant, but it is high available.

Compared to other cluster software, PowerHA / HACMP is highly robust, allows for large distances between nodes of a single cluster and allows up to 32 nodes in a cluster. Previous version of PowerHA / HACMP have had a reputation of having a lot of "bugs". From version 5.4 onward PowerHA / HACMP has seen a lot of improvements.

IBM's HACMP exists for over 15 years. It's not actually an IBM product; IBM bought it from CLAM, which was later renamed to Availant and then renamed to LakeViewTech and nowadays is called Vision Solutions. Until August 2006, all development of HACMP was done by CLAM. Nowadays, IBM does its own development of PowerHA / HACMP in Austin, Poughkeepsie and Bangalore.

Competitors of PowerHA / HACMP are Veritas Cluster and Echo Cluster. The last one, Echo Cluster, is a product of Vision Solutions mentioned above and tends to be easier to set-up and meant for simpler clusters. Veritas is only used by customers that use it already on other operating systems, like Sun Solaris and Windows Server environments, and don't want to invest into yet another clustering technology.

Topics: PowerHA / HACMP ↑

PowerHA / HACMP support matrix

Support matrix / life cycle for IBM PowerHA (with a typical 3 year lifecycle):

	AIX 5.1	AIX 5.2	AIX 5.3	AIX 6.1	AIX 7.1	AIX 7.2	Release Date	End Of Support
HACMP 5.1	Yes	Yes	Yes	No	No	No	7/11/2003	9/1/2006
HACMP 5.2	Yes	Yes	Yes	No	No	No	7/16/2004	9/30/2007
HACMP 5.3	No	ML4+	ML2+	Yes	No	No	8/12/2005	9/30/2009
HACMP 5.4.0	No	TL8+	TL4+	No	No	No	7/28/2006	9/30/2011
HACMP 5.4.1	No	TL8+	TL4+	Yes	Yes	No	9/11/2007	9/30/2011
PowerHA 5.5	No	No	TL7+	tl2 sp1+	Yes	No	11/14/2008	4/30/2012
PowerHA 6.1	No	No	TL9+	tl2 sp1+	Yes	No	10/20/2009	4/30/2015
PowerHA 7.1.0	No	No	No	tl6+	Yes	No	9/10/2010	9/30/2014
PowerHA 7.1.1	No	No	No	tl7 sp2+	tl1 sp2+	No	9/10/2010	4/30/2015
PowerHA 7.1.2	No	No	No	tl8 sp1+	tl2 sp1+	No	10/3/2012	4/30/2016
PowerHA 7.1.3	No	No	No	tl9 sp1+	tl3 sp1+	No	10/7/2013	4/30/2018
PowerHA 7.2.0	No	No	No	tl9 sp5+	tl3 sp5+ tl4 sp1+	tl0 sp1+	12/4/2015	4/30/2019
PowerHA 7.2.1	No	No	No	No	tl3+	tl0 sp1+	12/16/2016	4/30/2020
PowerHA 7.2.2	No	No	No	No	tl4+	tl0 sp1+	12/15/2017	tbd

Source: PowerHA for AIX Version Compatibility Matrix

Topics: Fun, PowerHA / HACMP ↑

HACMP humor

This is NOT a cluster snapshot, but a snapshot of a cluster.
(This is a very inside joke. Just ignore it if you don't get it.)

Topics: PowerHA / HACMP ↑

PowerHA / HACMP links

Official IBM sites:

Other PowerHA / HACMP related sites:

IBM's Redbooks on HACMP

lpar.co.uk

(Alex Abderrazag)

Topics: PowerHA / HACMP ↑

A few HACMP rules

With HACMP clusters documentation is probably the most important issue. You cannot properly manage an HACMP cluster if you do not document it. Document the precise configuration of the complete cluster and document any changes you've carried out. Also document all management procedures and stick to them! The cluster snapshot facility is an excellent way of documenting your cluster.

Next step: get educated. You have to know exactly what you're doing on an HACMP cluster. If you have to manage a production cluster, getting a certification is a necessity. Don't ever let non-HACMP-educated UNIX administrators on your HACMP cluster nodes. They don't have a clue of what's going on and probably destroy your carefully layed-out configuration.

Geographically separated nodes are important! Too many cluster nodes just sit on top of each other in the same rack. What if there's a fire? Or a power outage? Having an HACMP cluster won't help you if both nodes are on a single location, use the same power, or the same network switches.

Put your HACMP logs in a sensible location. Don't put them in /tmp knowing that /tmp gets purged every night....

Test, test, test and test your cluster over again. Doing take-over tests every half year is best practice. Document your tests, and your test results.

Don't assume that your cluster is high available after installing the cluster software. There are a lot of other things to consider in your infrastructure to avoid single points of failures, like: No two nodes sharing the same I/O drawer; Power redundancy; No two storage or network adapters on the same SCSI backplane or bus; Redundancy in SAN HBA's; Application monitoring in place.

Topics: Backup & restore, Spectrum Protect ↑

Tivoli Storage Manager introduction

Tivoli Storage Manager (TSM) is a backup system. Not just any backup system, but probably the best there is. It exists for over a decade and is a product of IBM. It used to be known as ADSM (Adstar Distributed Storage Manager); later on it was renamed to TSM (Tivoli Storage Manager) and nowadays it is known as ITSM (IBM Tivoli Storage Manager).

One of the most important benefits of TSM is the fact that it runs on many systems; it has a broad hardware support. TSM runs on AIX, HP-UX, Linux, Sun Solaris, and Windows. Client software for even more operating systems exist (Such as Mac, Netware, OS400, Tru64UNIX), thus enabling the backup of other operating systems to TSM.

Another important aspect of TSM: the progressive incremental backup. Only files that are changed or are new to the system are backed up, therefore eliminating unnecessary data transfers, and gives you faster backup times. Progressive incremental backups needs to backup less data, thus saving network bandwith, tapes and management overhead. No more full backups are required.

TSM can be combined with several Tivoli Data Protection agents. These are add-ons to TSM, which enable you to create backups of several databases (e.g. Oracle or DB2), ERP applications (e.g. SAP/R3), mail (e.g. Exchange), and others. These TDP agents also enables online backups, so vital systems can stay online during backups.

Besides backup and restore, TSM also includes functions for archiving/retrieval and hierarchical storage management, to free up disk space normally used by files that aren't accessed for an extended period of time.

TSM can be administered centrally, either via the command-line or via a Web-based Enterprise Console. TSM is based on a (DB2) database, which needs zero management. All information of backups, backup versions and overhead is stored in the TSM database. TSM has its own powerfull scheduler, for either backup schedules or TSM maintenance schedules (so-called administrative schedules) which automate a lot of house-keeping jobs in TSM.

TSM stores its backup data in a variety of storage pools, which can consist of various sequential (tape) media or direct-access (disk) media. The storage pools are very flexible and data can be moved from one storage pool to another (migration). Building a hierarchy of storage pools is possible, thus enabling fast backups to disk storage pools and later on, migration from disk to tape storage pools. Data storage is based on policies, defined by the administrator in TSM. These policies include information on how many backup versions to store, and how long to retain the backups.

Note: From TSM 5.3 the Web Admin GUI is no longer supported. Because of popular demand, IBM has made the TSM 5.2 Web Admin GUI available for TSM 5.3.

Number of results found: 470.
Displaying results: 371 - 380.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!