UNIX Health Check

Unconfiguring child objects

When removing a device on AIX, you may run into a message saying that a child device is not in a correct state. For example:

# rmdev -dl fcs3
Method error (/usr/lib/methods/ucfgcommo):
0514-029 Cannot perform the requested function because a
child device of the specified device is not in a correct state.

To determine what the child devices are, use the -p option of the lsdev command. From the man page of the lsdev command:

-p Parent
     Specifies the device logical name from the Customized Devices
     object class for the parent of devices to be displayed. The 
     -p Parent flag can be used to show the child devices of the 
     given Parent. The Parent argument to the -p flag may contain
     the same wildcard charcters that can be used with the odmget 
     command. This flag cannot be used with the -P flag.

For example:

# lsdev -p fcs3          
fcnet3 Defined   07-01-01 Fibre Channel Network Protocol Device
fscsi3 Available 07-01-02 FC SCSI I/O Controller Protocol Device

To remove the device, and all child devices, use the -R option. From the man page for the rmdev command:

-R
     Unconfigures the device and its children.
     When used with the -d or -S flags, the 
     children are undefined or stopped, respectively.

The command to remove adapter fcs3 and all child devices, will be:

# rmdev -Rdl fcs3

Topics: AIX, Security, System Admin ↑

An interesting open source project is Expect. It's a tool that can be used to automate interactive applications.

You can download the RPM for Expect can be downloaded from http://www.perzl.org/aix/index.php?n=Main.Expect, and the home page for Expect is http://www.nist.gov/el/msid/expect.cfm.

A very interesting tool that is part of the Expect RPM is "mkpasswd". It is a little Tcl script that uses Expect to work with the passwd program to generate a random password and set it immediately. A somewhat adjusted version of "mkpasswd" can be downloaded here. The adjusted version of mkpasswd will generate a random password for a user, with a length of 8 characters (the maximum password length by default for AIX), if you run for example:

# /usr/local/bin/mkpasswd username
sXRk1wd3

To see the interactive work performed by Expect for mkpasswd, use the -v option:

# /usr/local/bin/mkpasswd -v username
spawn /bin/passwd username
Changing password for "username"
username's New password:
Enter the new password again:
password for username is s8qh1qWZ

By using mkpasswd, you'll never have to come up with a random password yourself again, and it will prevent Unix system admins from assigning new passwords to accounts that are easily guessible, such as "changeme", or "abc1234".

Now, what if you would want to let "other" users (non-root users) to run this utility, and at the same time prevent them from resetting the password of user root?

Let's say you want user pete to be able to reset other user's passwords. Add the following entries to the /etc/sudoers file by running visudo:

# visudo

Cmnd_Alias MKPASSWD = /usr/local/bin/mkpasswd, \
                      ! /usr/local/bin/mkpasswd root
pete ALL=(ALL) NOPASSWD:MKPASSWD

This will allow pete to run the /usr/local/bin/mkpasswd utility, which he can use to reset passwords.

First, to check what he can run, use the "sudo -l" command:

# su - pete
$ sudo -l
User pete may run the following commands on this host:
(ALL) NOPASSWD: /usr/local/bin/mkpasswd, !/usr/local/bin/mkpasswd root

Then, an attempt, using pete's account, to reset another user's password (which is successful):

$ sudo /usr/local/bin/mkpasswd mark
oe09'ySMj

Then another attempt, to reset the root password (which fails):

$ sudo /usr/local/bin/mkpasswd root
Sorry, user pete is not allowed to execute 
'/usr/local/bin/mkpasswd root' as root.

Topics: AIX, Security, System Admin ↑

Migrating users from one AIX system to another

Since the files involved in the following procedure are flat ASCII files and their format has not changed from V4 to V5, the users can be migrated between systems running the same or different versions of AIX (for example, from V4 to V5).

Files that can be copied over:

/etc/group
/etc/passwd
/etc/security/group
/etc/security/limits
/etc/security/passwd
/etc/security/.ids
/etc/security/environ
/etc/security/.profile

NOTE: Edit the passwd file so the root entry is as follows:

root:!:0:0::/:/usr/bin/ksh

When you copy the /etc/passwd and /etc/group files, make sure they contain at least a minimum set of essential user and group definitions.

Listed specifically as users are the following:
root, daemon, bin, sys, adm, uucp, guest, nobody, lpd

Listed specifically as groups are the following:
system, staff, bin, sys, adm, uucp, mail, security, cron, printq, audit, ecs, nobody, usr

If the bos.compat.links fileset is installed, you can copy the /etc/security/mkuser.defaults file over. If it is not installed, the file is located as mkuser.default in the /usr/lib/security directory. If you copy over mkuser.defaults, changes must be made to the stanzas. Replace group with pgrp, and program with shell. A proper stanza should look like the following:

    user: 
            pgrp = staff 
            groups = staff 
            shell = /usr/bin/ksh 
            home = /home/$USER

The following files may also be copied over, as long as the AIX version in the new machine is the same:

/etc/security/login.cfg
/etc/security/user

NOTE: If you decide to copy these two files, open the /etc/security/user file and make sure that variables such as tty, registry, auth1 and so forth are set properly with the new machine. Otherwise, do not copy these two files, and just add all the user stanzas to the new created files in the new machine.

Once the files are moved over, execute the following:

# usrck -t ALL 
# pwdck -t ALL 
# grpck -t ALL

This will clear up any discrepancies (such as uucp not having an entry in /etc/security/passwd). Ideally this should be run on the source system before copying over the files as well as after porting these files to the new system.

NOTE: It is possible to find user ID conflicts when migrating users from older versions of AIX to newer versions. AIX has added new user IDs in different release cycles. These are reserved IDs and should not be deleted. If your old user IDs conflict with the newer AIX system user IDs, it is advised that you assign new user IDs to these older IDs.

From: http://www-01.ibm.com/support/docview.wss?uid=isg3T1000231

Topics: AIX, SAN, System Admin ↑

AIX fibre channel error - FCS_ERR6

This error can occur if the fibre channel adapter is extremely busy. The AIX FC adapter driver is trying to map an I/O buffer for DMA access, so the FC adapter can read or write into the buffer. The DMA mapping is done by making a request to the PCI bus device driver.

The PCI bus device driver is saying that it can't satisfy the request right now. There was simply too much IO at that moment, and the adapter couldn't handle them all. When the FC adapter is configured, we tell the PCI bus driver how much resource to set aside for us, and it may have gone over the limit. It is therefore recommended to increase the max_xfer_size on the fibre channel devices.

It depends on the type of fibre channel adapter, but usually the possible sizes are:

0x100000, 0x200000, 0x400000, 0x800000, 0x1000000

To view the current setting type the following command:

# lsattr -El fcsX -a max_xfer_size

Replace the X with the fibre channel adapter number.

You should get an output similar to the following:

max_xfer_size 0x100000 Maximum Transfer Size True

The value can be changed as follows, after which the server needs to be rebooted:

# chdev -l fcsX -a max_xfer_size=0x1000000 -P

Topics: AIX, System Admin, Virtualization ↑

Set up private network between 2 VIO clients

The following is a description of how you can set up a private network between two VIO clients on one hardware frame.

Servers to set up connection: server1 and server2
Purpose: To be used for Oracle interconnect (for use by Oracle RAC/CRS)

IP Addresses assigned by network team:

192.168.254.141 (server1priv)
192.168.254.142 (server2priv)
Subnetmask: 255.255.255.0

VLAN to be set up: PVID 4. This number is basically randomly chosen; it could have been 23 or 67 or whatever, as long as it is not yet in use. Proper documentation of your VIO setup and the defined networks, is therefore important.

Steps to set this up:

Log in to HMC GUI as hscroot.
Change the default profile of server1, and add a new virtual Ethernet adapter. Set the port virtual Ethernet to 4 (PVID 4). Select "This adapter is required for virtual server activation". Configuration -> Manage Profiles -> Select "Default" -> Actions -> Edit -> Select "Virtual Adapters" tab -> Actions -> Create Virtual Adapter -> Ethernet adapter -> Set "Port Virtual Ethernet" to 4 -> Select "This adapter is required for virtual server activation." -> Click Ok -> Click Ok -> Click Close.
Do the same for server2.
Now do the same for both VIO clients, but this time do "Dynamic Logical Partitioning". This way, we don't have to restart the nodes (as we previously have only updated the default profiles of both servers), and still get the virtual adapter.
Run cfgmgr on both nodes, and see that you now have an extra Ethernet adapter, in my case ent1.
Run "lscfg -vl ent1", and note the adapter ID (in my case C5) on both nodes. This should match the adapter IDs as seen on the HMC.
Now configure the IP address on this interface on both nodes.
Add the entries for server1priv and server2priv in /etc/hosts on both nodes.
Run a ping: ping server2priv (from server1) and vice versa.
Done!

Steps to throw it away:

On each node: deconfigure the en1 interface:
# ifconfig en1 detach
Rmdev the devices on each node:
```
# rmdev -dl en1
# rmdev -dl ent1
```
Remove the virtual adapter with ID 5 from the default profile in the HMC GUI for server1 and server2.
DLPAR the adapter with ID 5 out of server1 and server2.
Run cfgmgr on both nodes to confirm the adapter does not re-appear. Check with:
# lsdev -Cc adapter
Done!

Topics: AIX, PowerHA / HACMP, System Admin ↑

clstat: Failed retrieving cluster information

If clstat is not working, you may get the following error, when running clstat:

# clstat
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

Refer to the HACMP Administration Guide for more information.
Additional information for verifying the SNMP configuration on AIX 6
can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE

To resolve this, first of all, go ahead and read the README that is referred to. You'll find that you have to enable an entry in /etc/snmdv3.conf:

Commands clstat or cldump will not start if the internet MIB tree is not enabled in snmpdv3.conf file. This behavior is usually seen in AIX 6.1 onwards where this internet MIB entry was intentionally disabled as a security issue. This internet MIB entry is required to view/resolve risc6000clsmuxpd (1.3.6.1.4.1.2.3.1.2.1.5) MIB sub tree which is used by clstat or cldump functionality.

There are two ways to enable this MIB sub tree (risc6000clsmuxpd). They are:

1) Enable the main internet MIB entry by adding this line in /etc/snmpdv3.conf file:

VACM_VIEW defaultView internet - included -

But doing so is not recommended, as it unlocks the entire MIB tree.

2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling the main MIB tree by adding this line in /etc/snmpdv3.conf file.

VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included -

Note: After enabling the MIB entry above snmp daemon must be restarted with the following commands as shown below:

# stopsrc -s snmpd
# startsrc -s snmpd

After snmp is restarted leave the daemon running for about two minutes before attempting to start clstat or cldump.

Sometimes, even after doing this, clstat or cldump still don't work. Make sure that a COMMUNITY entry is present in /etc/snmpdv3.conf:

COMMUNITY public plubic noAuthNoPriv 0.0.0.0 0.0.0.0 -

The next thing may sound silly, but edit the /etc/snmpdv3.conf file, and take out the coments. Change this:

smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password  # gated
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password # HACMP/ES for AIX ...

To:

smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password

Then, recycle the deamons on all cluster nodes. This can be done while the cluster is up and running:

# stopsrc -s hostmibd
# stopsrc -s snmpmibd
# stopsrc -s aixmibd
# stopsrc -s snmpd
# sleep 4
# chssys -s hostmibd -a "-c public"
# chssys -s aixmibd  -a "-c public"
# chssys -s snmpmibd  -a "-c public"
# sleep 4
# startsrc -s snmpd
# startsrc -s aixmibd
# startsrc -s snmpmibd
# startsrc -s hostmibd
# sleep 120
# stopsrc -s clinfoES
# startsrc -s clinfoES
# sleep 120

Now, to verify that it works, run either clstat or cldump, or the following command:

# snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs cluster

Still not working at this point? Then run an Extended Verification and Synchronization:

# smitty cm_ver_and_sync.select

After that, clstat, cldump and snmpinfo should work.

Topics: AIX, System Admin ↑

Too many open files

To determine if the number of open files is growing over a period of time, issue lsof to report the open files against a PID on a periodic basis. For example:

# lsof -p (PID of process) -r (interval) > lsof.out

Note: The interval is in seconds, 1800 for 30 minutes.

This output does not give the actual file names to which the handles are open. It provides only the name of the file system (directory) in which they are contained. The lsof command indicates if the open file is associated with an open socket or a file. When it references a file, it identifies the file system and the inode, not the file name.

Run the following command to determine the file name:

# df -kP filesystem_from_lsof | awk '{print $6}' | tail -1

Now note the filesystem name. And then run:

# find filesystem_name -inum inode_from_lsof -print

This will show the actual file name.

To increase the number, change or add the nofiles=XXXXX parameter in the /etc/security/limits file, run:

# chuser nofiles=XXXXX user_id

You can also use svmon:

# svmon -P java_pid -m | grep pers

This lists opens files in the format: filesystem_device:inode. Use the same procedure as above for finding the actual file name.

Topics: AIX, Security, System Admin ↑

DSH fails with host key verification failed

If you try to estabilish a dsh session with a remote node sometimes you may get an error message like this:

# dsh -n server date
server.domain.com: Host key verification failed.
dsh:  2617-009 server.domain.com remote shell had exit code 255

Connecting with ssh works well with key authentication:

# ssh server

The difference between the two connections is that the dsh uses the FQDN, and the FQDN needs to be added to the known_hosts file for SSH. Therefore you must make an ssh connection first with FQDN to the host:

# ssh server.domain.com date
The authenticity of host server.domain.com can't be established.
RSA key fingerprint is 1b:b1:89:c0:63:d5:f1:f1:41:fa:38:14:d8:60:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added server.domain.com (RSA) 
to the list of known hosts.
Tue Sep  6 11:56:34 EDT 2011

Now try to use dsh again, and you'll see it will work:

# dsh -n server date
server.domain.com: Tue Sep  6 11:56:38 EDT 2011

Topics: AIX, Backup & restore, System Admin ↑

Restoring individual files from a mksysb image

Sometimes, you just need that one single file from a mksysb image backup. It's really not that difficult to accomplish this.

First of all, go to the directory that contains the mksysb image file:

# cd /sysadm/iosbackup

In this example, were using the mksysb image of a Virtual I/O server, created using iosbackup. This is basically the same as a mksysb image from a regular AIX system. The image file for this mksysb backup is called vio1.mksysb

First, try to locate the file you're looking for; For example, if you're looking for file nimbck.ksh:

# restore -T -q -l -f vio1.mksysb | grep nimbck.ksh
New volume on vio1.mksysb:
Cluster size is 51200 bytes (100 blocks).
The volume number is 1.
The backup date is: Thu Jun  9 23:00:28 MST 2011
Files are backed up by name.
The user is padmin.
-rwxr-xr-x- 10   staff  May 23  08:37  1801 ./home/padmin/nimbck.ksh

Here you can see the original file was located in /home/padmin.

Now recover that one single file:

# restore -x -q -f vio1.mksysb ./home/padmin/nimbck.ksh
x ./home/padmin/nimbck.ksh

Note that it is important to add the dot before the filename that needs to be recovered. Otherwise it won't work. Your file is now restored to ./home/padmin/nimbck.ksh, which is a relative folder from the current directory you're in right now:

# cd ./home/padmin
# ls -als nimbck.ksh
4 -rwxr-xr-x    1 10  staff  1801 May 23 08:37 nimbck.ksh

Topics: AIX, Backup & restore, LVM, System Admin ↑

Use dd to backup raw partition

The savevg command can be used to backup user volume groups. All logical volume information is archived, as well as JFS and JFS2 mounted filesystems. However, this command cannot be used to backup raw logical volumes.

Save the contents of a raw logical volume onto a file using:

# dd if=/dev/lvname of=/file/system/lvname.dd

This will create a copy of logical volume "lvname" to a file "lvname.dd" in file system /file/system. Make sure that wherever you write your output file to (in the example above to /file/system) has enough disk space available to hold a full copy of the logical volume. If the logical volume is 100 GB, you'll need 100 GB of file system space for the copy.

If you want to test how this works, you can create a logical volume with a file system on top of it, and create some files in that file system. Then unmount he filesystem, and use dd to copy the logical volume as described above.

Then, throw away the file system using "rmfs -r", and after that has been completed, recreate the logical volume and the file system. If you now mount the file system, you will see, that it is empty. Unmount the file system, and use the following dd command to restore your backup copy:

# dd if=/file/system/lvname.dd of=/dev/lvname

Then, mount the file system again, and you will see that the contents of the file system (the files you've placed in it) are back.

Number of results found for topic AIX: 231.
Displaying results: 51 - 60.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!