UNIX Health Check

Topics: Backup & restore, EMC, EMC Networker

Restart Networker

This is how to stop EMC Networker:

# /bin/nsr_shutdown

And this is how you start it (taken from /etc/inittab):

# echo "sh /etc/rc.nsr" | at now

Topics: Backup & restore, EMC, EMC Networker ↑

EMC/Legato Networker: Performing recoveries from command Line

To perform recoveries from EMC (or Legato) Networker on the command line, you can use the recover command. The recover command runs in two modes:

Interactive mode: Interactive mode is the default mode for the recover command. This mode places you in a shell-like environment that allows you to use subcommands. These commands let you navigate the client file index to select and recover files and directories.

Non-interactive mode: In non-interactive mode, the files specified on the command line are recovered automatically without browsing. To activate non-interactive mode, use the -a option.

Using recover in Interactive Mode:

Login to the server you need to recover the file for and then type recover. This will place you in the recover shell environment. You can also type recover [pathname] to set your initial working directory (recover /etc), the default is the current working directory.

# recover /etc 
Current working directory is /etc
recover>

Note: If you do not get a recover prompted when you type recover, add a -s servername option:

# recover -s hostname

The following commands let you navigate a client file index to select and recover files and directories:

ls
Lists information about the given files and directories. When no name argument is provided, ls lists the contents of the current directory. When you specify a directory as name, the content of that directory is displayed.
cd
Changes the current working directory. The default is the directory in which you executed recover.
pwd
Prints the full pathname of the current working directory.
add [name.. ]
Adds the current directory or the named files or directories to the recover list. If a directory is specified, it is added with all of the subordinate files to the recover list.
delete [name..]
Deletes the current directory or the named files or directories from the recover list. If a directory is specified, that directory and all of the subordinate files are deleted from the recover list.
versions [name..]
List all available versions for a file or directory. If no name is given the current working directory is used.
changetime
Change the backup browse time to recover files before the last backup. You will be prompted for new time. Time can be entered as December 15, 2009 or 12/15/2009.
list
Displays the files on the recover list.
recover
Recovers all files on the recover list from the Networker server. Upon completion, the recover list is empty.
exit
Exits immediately from the recovery.
quit
Exits immediately from the recover. Files on the recover list are not recovered.

Using recover in Non-interactive mode:

In non-interactive mode, the files specified on the command line are recovered automatically without browsing. To activate non-interactive mode, use the -a option. For example:

Recover the /etc/hosts file from the most recent backup:

# recover -a /etc/hosts

Using the recover Command in Directed Recoveries:

To relocate recovered files use the -d destination option with the recover command:

# recover -a -d /restore /etc/hosts
Recovering 1 file from /etc/ into /restore
Requesting 1 file(s), this may take a while...
./hosts
Received 1 file(s) from NSR server `networker'
Recover completion time: Thu Nov 18 14:39:15 2009

Using the recover Command to recover a file from a specific date: Enter the recover shell by typing recover. Locate the file you need to restore using the ls and cd commands. List the versions for the file using the versions command, and use the changetime command to change to the day the file was backed up. Add the file to the recovery list using the add command.

# recover 
Current working directory is /
recover> versions /etc/hosts

Versions of `/etc/hosts':

   4 -rw-rw-r-- root     system       2006 Mar 31 16:32 hosts
     save time:  Mon Aug  9 20:02:53 EDT 2010
      location:  004049

   4 -rw-rw-r-- root     system       2006 Mar 31 16:32 hosts
     save time:  Fri Aug  6 21:11:07 EDT 2010
      location:  DD0073 at DDVTL

   4 -rw-rw-r-- root     system       2006 Mar 31 16:32 hosts
     save time:  Mon Aug  2 20:06:48 EDT 2010
      location:  004242 at rd=ntwrkrstgnd1:ATL

   4 -rw-rw-r-- root     system       2006 Mar 31 16:32 hosts
     save time:  Fri Jul 30 21:09:15 EDT 2010
      location:  DD0054 at DDVTL

   4 -rw-rw-r-- root     system       2006 Mar 31 16:32 hosts
     save time:  Mon Jul 26 20:10:20 EDT 2010
      location:  004095

recover> changetime 8/1/2010
6497:recover: time changed to Sun Aug  1 23:59:59 EDT 2010
recover> add /etc/hosts
/etc
1 file(s) marked for recovery
recover> recover
Recovering 1 file into its original location
Volumes needed (all on-line):
        DD0054 at \\.\Tape20
Total estimated disk space needed for recover is 4 KB.
Requesting 1 file(s), this may take a while...
./hosts
./hosts file exists, overwrite (n, y, N, Y) or rename (r, R) [n]? y
Overwriting ./hosts
Received 1 file(s) from NSR server `networker'
Recover completion time: Thu Aug 12 17:34:06 EDT 2010

Using the -f option we can recover files from the command line without having to answer questions if we want to overwrite any existing files. For example, if you wish to recover the entire /etc file system into /tmp:

# recover -f -d /tmp/ -a /etc/

All the files will be recovered to /tmp/etc.

The -c option can be used to recover files from different client. For example, if you wish to recover the entire /etc file system of server "otherclient" to /tmp:

# recover -f -c otherclient -d /tmp/ -a /etc/

The -t option can be used to do a point-in-time recover of a file and/or file system. For example, to recover the /etc/hosts file of 09/05/2010 at noon:

# recover -s networkerserver -t "09/05/2010 12:00" -a /etc/hosts

Recovering multiple files is also possible. For example, if you wish to recover 2 mksysb images:

# recover -f -c client -s server -a mksysb.image1 mksysb.image2

Topics: AIX, EMC, Installation, PowerHA / HACMP, SAN, System Admin ↑

Quick setup guide for HACMP

Use this procedure to quickly configure an HACMP cluster, consisting of 2 nodes and disk heartbeating.

Prerequisites:

Make sure you have the following in place:

Have the IP addresses and host names of both nodes, and for a service IP label. Add these into the /etc/hosts files on both nodes of the new HACMP cluster.
Make sure you have the HACMP software installed on both nodes. Just install all the filesets of the HACMP CD-ROM, and you should be good.
Make sure you have this entry in /etc/inittab (as one of the last entries):
clinit:a:wait:/bin/touch /usr/es/sbin/cluster/.telinit
In case you're using EMC SAN storage, make sure you configure you're disks correctly as hdiskpower devices. Or, if you're using a mksysb image, you may want to follow this procedure EMC ODM cleanup.

Steps:

Create the cluster and its nodes:
```
# smitty hacmp
Initialization and Standard Configuration
Configure an HACMP Cluster and Nodes
```
Enter a cluster name and select the nodes you're going to use. It is vital here to have the hostnames and IP address correctly entered in the /etc/hosts file of both nodes.

Create an IP service label:

# smitty hacmp
Initialization and Standard Configuration
Configure Resources to Make Highly Available
Configure Service IP Labels/Addresses
Add a Service IP Label/Address

Enter an IP Label/Address (press F4 to select one), and enter a Network name (again, press F4 to select one).

Set up a resource group:
```
# smitty hacmp
Initialization and Standard Configuration
Configure HACMP Resource Groups
Add a Resource Group
```
Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never Fallback". This way, when the primary node in the cluster comes up, and the resource group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node.

Note: The order of the nodes is determined by the order you select the nodes here. If you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority.

Add the Servie IP/Label to the resource group:

# smitty hacmp
Initialization and Standard Configuration
Configure HACMP Resource Groups
Change/Show Resources for a Resource Group (standard)

Select the resource group you've created earlier, and add the Service IP/Label.

Run a verification/synchronization:
```
# smitty hacmp
Extended Configuration
Extended Verification and Synchronization
```
Just hit [ENTER] here. Resolve any issues that may come up from this synchronization attempt. Repeat this process until the verification/synchronization process returns "Ok". It's a good idea here to select to "Automatically correct errors".
Start the HACMP cluster:
```
# smitty hacmp
System Management (C-SPOC)
Manage HACMP Services
Start Cluster Services
```
Select both nodes to start. Make sure to also start the Cluster Information Daemon.
Check the status of the cluster:
```
# clstat -o
# cldump
```
Wait until the cluster is stable and both nodes are up.

Basically, the cluster is now up-and-running. However, during the Verification & Synchronization step, it will complain about not having a non-IP network. The next part is for setting up a disk heartbeat network, that will allow the nodes of the HACMP cluster to exchange disk heartbeat packets over a SAN disk. We're assuming here, you're using EMC storage. The process on other types of SAN storage is more or less similar, except for some differences, e.g. SAN disks on EMC storage are called "hdiskpower" devices, and they're called "vpath" devices on IBM SAN storage.

First, look at the available SAN disk devices on your nodes, and select a small disk, that won't be used to store any data on, but only for the purpose of doing the disk heartbeat. It is a good habit, to request your SAN storage admin to zone a small LUN as a disk heartbeating device to both nodes of the HACMP cluster. Make a note of the PVID of this disk device, for example, if you choose to use device hdiskpower4:

# lspv | grep hdiskpower4
hdiskpower4   000a807f6b9cc8e5    None

So, we're going to set up the disk heartbeat network on device hdiskpower4, with PVID 000a807f6b9cc8e5:

Create an concurrent volume group:
```
# smitty hacmp
System Management (C-SPOC)
HACMP Concurrent Logical Volume Management
Concurrent Volume Groups
Create a Concurrent Volume Group
```
Select both nodes to create the concurrent volume group on by pressing F7 for each node. Then select the correct PVID. Give the new volume group a name, for example "hbvg".

Set up the disk heartbeat network:

# smitty hacmp
Extended Configuration
Extended Topology Configuration
Configure HACMP Networks
Add a Network to the HACMP Cluster

Select "diskhb" and accept the default Network Name.

Run a discovery:

# smitty hacmp
Extended Configuration
Discover HACMP-related Information from Configured Nodes

Add the disk device:

# smitty hacmp
Extended Configuration
Extended Topology Configuration
Configure HACMP Communication Interfaces/Devices
Add Communication Interfaces/Devices
Add Discovered Communication Interface and Devices
Communication Devices

Select the disk device on both nodes by selecting the same disk on each node by pressing F7.

Run a Verification & Synchronization again, as described earlier above. Then check with clstat and/or cldump again, to check if the disk heartbeat network comes online.

Topics: AIX, EMC, SAN, Storage, System Admin ↑

Unable to remove hdiskpower devices due to a method error

If you get a method error when trying to rmdev -dl your hdiskpower devices, then follow this procedure.

Cannot remove hdiskpower devices with rmdev, get error "method error (/etc/methods/ucfgpowerdisk):"

The fix is to uninstall/reinstall Powerpath, but you won't be able to until you remove the hdiskpower devices with this procedure:

# odmdelete -q name=hdiskpowerX -o CuDv
(for every hdiskpower device)
# odmdelete -q name=hdiskpowerX -o CuAt
(for every hdiskpower device)
# odmdelete -q name=powerpath0 -o CuDv
# odmdelete -q name=powerpath0 -o CuAt
# rm /dev/powerpath0

You must remove the modified files installed by powerpath and then reboot the server. You will then be able to uninstall powerpath after the reboot via the "installp -u EMCpower" command. The files to be removed are as follows:

(Do not be concerned if some of the removals do not work as PowerPath may not be fully configured properly).

# rm ./etc/PowerPathExtensions
# rm ./etc/emcp_registration
# rm ./usr/lib/boot/protoext/disk.proto.ext.scsi.pseudo.power
# rm ./usr/lib/drivers/pnext
# rm ./usr/lib/drivers/powerdd
# rm ./usr/lib/drivers/powerdiskdd
# rm ./usr/lib/libpn.a
# rm ./usr/lib/methods/cfgpower
# rm ./usr/lib/methods/cfgpowerdisk
# rm ./usr/lib/methods/chgpowerdisk
# rm ./usr/lib/methods/power.cat
# rm ./usr/lib/methods/ucfgpower
# rm ./usr/lib/methods/ucfgpowerdisk
# rm ./usr/lib/nls/msg/en_US/power.cat
# rm ./usr/sbin/powercf
# rm ./usr/sbin/powerprotect
# rm ./usr/sbin/pprootdev
# rm ./usr/lib/drivers/cgext
# rm ./usr/lib/drivers/mpcext
# rm ./usr/lib/libcg.so
# rm ./usr/lib/libcong.so
# rm ./usr/lib/libemcp_mp_rtl.so
# rm ./usr/lib/drivers/mpext
# rm ./usr/lib/libmp.a
# rm ./usr/sbin/emcpreg
# rm ./usr/sbin/powermt
# rm ./usr/share/man/man1/emcpreg.1
# rm ./usr/share/man/man1/powermt.1
# rm ./usr/share/man/man1/powerprotect.1

Re-install Powerpath.

Topics: AIX, EMC, PowerHA / HACMP, SAN, Storage, System Admin ↑

Missing disk method in HACMP configuration

Issue when trying to bring up a resource group: For example, the hacmp.out log file contains the following:

cl_disk_available[187] cl_fscsilunreset fscsi0 hdiskpower1 false cl_fscsilunreset[124]: openx(/dev/hdiskpower1, O_RDWR, 0, SC_NO_RESERVE): Device busy cl_fscsilunreset[400]: ioctl SCIOLSTART id=0X11000 lun=0X1000000000000 : Invalid argument

To resolve this, you will have to make sure that the SCSI reset disk method is configured in HACMP. For example, when using EMC storage:

Make sure emcpowerreset is present in /usr/lpp/EMC/Symmetrix/bin/emcpowerreset.

Then add new custom disk method:

Enter into the SMIT fastpath for HACMP "smitty hacmp".
Select Extended Configuration.
Select Extended Resource Configuration.
Select HACMP Extended Resources Configuration.
Select Configure Custom Disk Methods.
Select Add Custom Disk Methods.

      Change/Show Custom Disk Methods

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

                                                 [Entry Fields]
* Disk Type (PdDvLn field from CuDv)             disk/pseudo/power
* New Disk Type                                  [disk/pseudo/power]
* Method to identify ghost disks                 [SCSI3]
* Method to determine if a reserve is held       [SCSI_TUR]
* Method to break reserve [/usr/lpp/EMC/Symmetrix/bin/emcpowerreset]
  Break reserves in parallel                     true
* Method to make the disk available              [MKDEV]

Topics: EMC, SAN, Storage ↑

EMC PowerPath key installation

This describes how to configure the EMC PowerPath registration keys.

First, check the current configuration of PowerPath:

# powermt config
Warning: all licenses for storage systems support are missing or expired.

The install the keys:

# emcpreg -install

=========== EMC PowerPath Registration ===========
Do you have a new registration key or keys to enter?[n] y
Enter the registration keys(s) for your product(s),
one per line, pressing Enter after each key.
After typing all keys, press Enter again.

Key (Enter if done): P6BV-4KDB-QET6-RF9A-QV9D-MN3V
1 key(s) successfully added.
Key successfully installed.

Key (Enter if done):
1 key(s) successfully registered.

(Note: the license key used in this example is not valid).

Topics: EMC, SAN, Storage ↑

EMC Grab

EMC Grab is a utility that is run locally on each host and gathers storage-specific information (driver version, storage-technical details, etc). The EMC Grab report creates a zip file. This zip file can be used by EMC support.

You can download the "Grab Utility" from the following locations:

ftp://ftp.emc.com/pub/emcgrab/Unix/

When you've downloaded EMCgrab, and stored in a temporary location on the server like /tmp/emc, untar it using:

tar -xvf *tar

Then run:

/tmp/emc/emcgrab/emcgrab.sh

The script is interactive and finishes after a couple of minutes.

Topics: EMC, SAN, Storage ↑

Reset reservation bit

If you run into not being able to access an hdiskpowerX disk, you may need to reset the reservation bit on it:

# /usr/lpp/EMC/Symmetrix/bin/emcpowerreset fscsiX hdiskpowerX

Topics: EMC, SAN, Storage ↑

BCV issue with Solution Enabler

There is a known bug on AIX with Solution Enabler, the software responsible for BCV backups. Hdiskpower devices dissapear and you need to run the following command to make them come back. This will happen when a server is rebooted. BCV devices are only visible on the target servers.

# /usr/lpp/EMC/Symmetrix/bin/mkbcv -a ALL
hdisk2 Available
hdisk3 Available
hdisk4 Available
hdisk5 Available
hdisk6 Available
hdisk7 Available
hdisk8 Available
hdiskpower1 Available
hdiskpower2 Available
hdiskpower3 Available
hdiskpower4 Available

Topics: EMC, Installation, SAN, Storage ↑

EMC and MPIO

You can run into an issue with EMC storage on AIX systems using MPIO (No Powerpath) for your boot disks:

After installing the ODM_DEFINITONS of EMC Symmetrix on your client system, the system won't boot any more and will hang with LED 554 (unable to find boot disk).

The boot hang (LED 554) is not caused by the EMC ODM package itself, but by the boot process not detecting a path to the boot disk if the first MPIO path does not corresponding to the fscsiX driver instance where all hdisks are configured. Let me explain that more in detail:

Let's say we have an AIX system with four HBAs configured in the following order:

# lscfg -v | grep fcs
fcs2 (wwn 71ca) -> no devices configured behind this fscsi2 driver instance (path only configured in CuPath ODM table)
fcs3 (wwn 71cb) -> no devices configured behind this fscsi3 driver instance (path only configured in CuPath ODM table)
fcs0 (wwn 71e4) -> no devices configured behind this fscsi0 driver instance (path only configured in CuPath ODM table)
fcs1 (wwn 71e5) -> ALL devices configured behind this fscsi1 driver instance

Looking at the MPIO path configuration, here is what we have for the rootvg disk:

# lspath -l hdisk2 -H -F"name parent path_id connection status"
name   parent path_id connection                      status
hdisk2 fscsi0 0       5006048452a83987,33000000000000 Enabled
hdisk2 fscsi1 1       5006048c52a83998,33000000000000 Enabled
hdisk2 fscsi2 2       5006048452a83986,33000000000000 Enabled
hdisk2 fscsi3 3       5006048c52a83999,33000000000000 Enabled

The fscsi1 driver instance is the second path (pathid 1), then remove the 3 paths keeping only the path corresponding to fscsi1 :

# rmpath -l hdisk2 -p fscsi0 -d
# rmpath -l hdisk2 -p fscsi2 -d
# rmpath -l hdisk2 -p fscsi3 -d
# lspath -l hdisk2 -H -F"name parent path_id connection status"

Afterwards, do a savebase to update the boot lv hd5. Set up the bootlist to hdisk2 and reboot the host.

It will come up successfully, no more hang LED 554.

When checking the status of the rootvg disk, a new hdisk10 has been configured with the correct ODM definitions as shown below:

# lspv
hdisk10 0003027f7f7ca7e2 rootvg active
# lsdev -Cc disk
hdisk2 Defined 00-09-01 MPIO Other FC SCSI Disk Drive
hdisk10 Available 00-08-01 EMC Symmetrix FCP MPIO Raid6

To summarize, it is recommended to setup ONLY ONE path when installing an AIX to a SAN disk, then install the EMC ODM package then reboot the host and only after that is complete, add the other paths. Dy doing that we ensure that the fscsiX driver instance used for the boot process has the hdisk configured behind.

Number of results found for topic EMC: 13.
Displaying results: 1 - 10.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!