Topics: AIX, Security, System Admin

Migrating users from one AIX system to another

Since the files involved in the following procedure are flat ASCII files and their format has not changed from V4 to V5, the users can be migrated between systems running the same or different versions of AIX (for example, from V4 to V5).

Files that can be copied over:

  • /etc/group
  • /etc/passwd
  • /etc/security/group
  • /etc/security/limits
  • /etc/security/passwd
  • /etc/security/.ids
  • /etc/security/environ
  • /etc/security/.profile
NOTE: Edit the passwd file so the root entry is as follows:
root:!:0:0::/:/usr/bin/ksh
When you copy the /etc/passwd and /etc/group files, make sure they contain at least a minimum set of essential user and group definitions.

Listed specifically as users are the following:
root, daemon, bin, sys, adm, uucp, guest, nobody, lpd

Listed specifically as groups are the following:
system, staff, bin, sys, adm, uucp, mail, security, cron, printq, audit, ecs, nobody, usr

If the bos.compat.links fileset is installed, you can copy the /etc/security/mkuser.defaults file over. If it is not installed, the file is located as mkuser.default in the /usr/lib/security directory. If you copy over mkuser.defaults, changes must be made to the stanzas. Replace group with pgrp, and program with shell. A proper stanza should look like the following:
    user: 
            pgrp = staff 
            groups = staff 
            shell = /usr/bin/ksh 
            home = /home/$USER 
The following files may also be copied over, as long as the AIX version in the new machine is the same:
  • /etc/security/login.cfg
  • /etc/security/user
NOTE: If you decide to copy these two files, open the /etc/security/user file and make sure that variables such as tty, registry, auth1 and so forth are set properly with the new machine. Otherwise, do not copy these two files, and just add all the user stanzas to the new created files in the new machine.

Once the files are moved over, execute the following:
# usrck -t ALL 
# pwdck -t ALL 
# grpck -t ALL 
This will clear up any discrepancies (such as uucp not having an entry in /etc/security/passwd). Ideally this should be run on the source system before copying over the files as well as after porting these files to the new system.

NOTE: It is possible to find user ID conflicts when migrating users from older versions of AIX to newer versions. AIX has added new user IDs in different release cycles. These are reserved IDs and should not be deleted. If your old user IDs conflict with the newer AIX system user IDs, it is advised that you assign new user IDs to these older IDs.

From: http://www-01.ibm.com/support/docview.wss?uid=isg3T1000231

Topics: AIX, SAN, System Admin

AIX fibre channel error - FCS_ERR6

This error can occur if the fibre channel adapter is extremely busy. The AIX FC adapter driver is trying to map an I/O buffer for DMA access, so the FC adapter can read or write into the buffer. The DMA mapping is done by making a request to the PCI bus device driver.

The PCI bus device driver is saying that it can't satisfy the request right now. There was simply too much IO at that moment, and the adapter couldn't handle them all. When the FC adapter is configured, we tell the PCI bus driver how much resource to set aside for us, and it may have gone over the limit. It is therefore recommended to increase the max_xfer_size on the fibre channel devices.

It depends on the type of fibre channel adapter, but usually the possible sizes are:

0x100000, 0x200000, 0x400000, 0x800000, 0x1000000

To view the current setting type the following command:

# lsattr -El fcsX -a max_xfer_size
Replace the X with the fibre channel adapter number.

You should get an output similar to the following:
max_xfer_size 0x100000 Maximum Transfer Size True
The value can be changed as follows, after which the server needs to be rebooted:
# chdev -l fcsX -a max_xfer_size=0x1000000 -P

Topics: Virtual I/O Server, Virtualization

Virtual I/O Server lifecycle dates

ProductVersionEnd of Support
PowerVM VIOS Enterprise Edition2.2.xnot announced
PowerVM VIOS Express Edition2.2.xnot announced
PowerVM VIOS Standard Edition2.2.xnot announced
PowerVM VIOS Enterprise Edition2.1.xSep 30, 2012
PowerVM VIOS Express Edition2.1.xSep 30, 2012
PowerVM VIOS Standard Edition2.1.xOct 30, 2012
Virtual I/O Server1.5.xSep 30, 2011
Virtual I/O Server1.4.xSep 30, 2010
Virtual I/O Server1.3.xSep 30, 2009
Virtual I/O Server1.2.xSep 30, 2008
Virtual I/O Server1.1.xSep 30, 2008

Source: http://www-01.ibm.com/software/support/aix/lifecycle/index.html

Topics: AIX, System Admin, Virtualization

Set up private network between 2 VIO clients

The following is a description of how you can set up a private network between two VIO clients on one hardware frame.

Servers to set up connection: server1 and server2
Purpose: To be used for Oracle interconnect (for use by Oracle RAC/CRS)

IP Addresses assigned by network team:

192.168.254.141 (server1priv)
192.168.254.142 (server2priv)
Subnetmask: 255.255.255.0
VLAN to be set up: PVID 4. This number is basically randomly chosen; it could have been 23 or 67 or whatever, as long as it is not yet in use. Proper documentation of your VIO setup and the defined networks, is therefore important.

Steps to set this up:
  • Log in to HMC GUI as hscroot.
  • Change the default profile of server1, and add a new virtual Ethernet adapter. Set the port virtual Ethernet to 4 (PVID 4). Select "This adapter is required for virtual server activation". Configuration -> Manage Profiles -> Select "Default" -> Actions -> Edit -> Select "Virtual Adapters" tab -> Actions -> Create Virtual Adapter -> Ethernet adapter -> Set "Port Virtual Ethernet" to 4 -> Select "This adapter is required for virtual server activation." -> Click Ok -> Click Ok -> Click Close.
  • Do the same for server2.
  • Now do the same for both VIO clients, but this time do "Dynamic Logical Partitioning". This way, we don't have to restart the nodes (as we previously have only updated the default profiles of both servers), and still get the virtual adapter.
  • Run cfgmgr on both nodes, and see that you now have an extra Ethernet adapter, in my case ent1.
  • Run "lscfg -vl ent1", and note the adapter ID (in my case C5) on both nodes. This should match the adapter IDs as seen on the HMC.
  • Now configure the IP address on this interface on both nodes.
  • Add the entries for server1priv and server2priv in /etc/hosts on both nodes.
  • Run a ping: ping server2priv (from server1) and vice versa.
  • Done!
Steps to throw it away:
  • On each node: deconfigure the en1 interface:
    # ifconfig en1 detach
  • Rmdev the devices on each node:
    # rmdev -dl en1
    # rmdev -dl ent1
    
  • Remove the virtual adapter with ID 5 from the default profile in the HMC GUI for server1 and server2.
  • DLPAR the adapter with ID 5 out of server1 and server2.
  • Run cfgmgr on both nodes to confirm the adapter does not re-appear. Check with:
    # lsdev -Cc adapter
  • Done!

Topics: AIX, PowerHA / HACMP, System Admin

clstat: Failed retrieving cluster information

If clstat is not working, you may get the following error, when running clstat:

# clstat
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

Refer to the HACMP Administration Guide for more information.
Additional information for verifying the SNMP configuration on AIX 6
can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
To resolve this, first of all, go ahead and read the README that is referred to. You'll find that you have to enable an entry in /etc/snmdv3.conf:
Commands clstat or cldump will not start if the internet MIB tree is not enabled in snmpdv3.conf file. This behavior is usually seen in AIX 6.1 onwards where this internet MIB entry was intentionally disabled as a security issue. This internet MIB entry is required to view/resolve risc6000clsmuxpd (1.3.6.1.4.1.2.3.1.2.1.5) MIB sub tree which is used by clstat or cldump functionality.

There are two ways to enable this MIB sub tree (risc6000clsmuxpd). They are:

1) Enable the main internet MIB entry by adding this line in /etc/snmpdv3.conf file:

VACM_VIEW defaultView internet - included -

But doing so is not recommended, as it unlocks the entire MIB tree.

2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling the main MIB tree by adding this line in /etc/snmpdv3.conf file.

VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included -

Note: After enabling the MIB entry above snmp daemon must be restarted with the following commands as shown below:

# stopsrc -s snmpd
# startsrc -s snmpd

After snmp is restarted leave the daemon running for about two minutes before attempting to start clstat or cldump.
Sometimes, even after doing this, clstat or cldump still don't work. Make sure that a COMMUNITY entry is present in /etc/snmpdv3.conf:
COMMUNITY public plubic noAuthNoPriv 0.0.0.0 0.0.0.0 -
The next thing may sound silly, but edit the /etc/snmpdv3.conf file, and take out the coments. Change this:
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password  # gated
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password # HACMP/ES for AIX ...
To:
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password
Then, recycle the deamons on all cluster nodes. This can be done while the cluster is up and running:
# stopsrc -s hostmibd
# stopsrc -s snmpmibd
# stopsrc -s aixmibd
# stopsrc -s snmpd
# sleep 4
# chssys -s hostmibd -a "-c public"
# chssys -s aixmibd  -a "-c public"
# chssys -s snmpmibd  -a "-c public"
# sleep 4
# startsrc -s snmpd
# startsrc -s aixmibd
# startsrc -s snmpmibd
# startsrc -s hostmibd
# sleep 120
# stopsrc -s clinfoES
# startsrc -s clinfoES
# sleep 120
Now, to verify that it works, run either clstat or cldump, or the following command:
# snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs cluster
Still not working at this point? Then run an Extended Verification and Synchronization:
# smitty cm_ver_and_sync.select
After that, clstat, cldump and snmpinfo should work.

Topics: AIX, System Admin

Too many open files

To determine if the number of open files is growing over a period of time, issue lsof to report the open files against a PID on a periodic basis. For example:

# lsof -p (PID of process) -r (interval) > lsof.out
Note: The interval is in seconds, 1800 for 30 minutes.

This output does not give the actual file names to which the handles are open. It provides only the name of the file system (directory) in which they are contained. The lsof command indicates if the open file is associated with an open socket or a file. When it references a file, it identifies the file system and the inode, not the file name.

Run the following command to determine the file name:
# df -kP filesystem_from_lsof | awk '{print $6}' | tail -1
Now note the filesystem name. And then run:
# find filesystem_name -inum inode_from_lsof -print
This will show the actual file name.

To increase the number, change or add the nofiles=XXXXX parameter in the /etc/security/limits file, run:
# chuser nofiles=XXXXX user_id
You can also use svmon:
# svmon -P java_pid -m | grep pers
This lists opens files in the format: filesystem_device:inode. Use the same procedure as above for finding the actual file name.

Topics: AIX, Security, System Admin

DSH fails with host key verification failed

If you try to estabilish a dsh session with a remote node sometimes you may get an error message like this:

# dsh -n server date
server.domain.com: Host key verification failed.
dsh:  2617-009 server.domain.com remote shell had exit code 255
Connecting with ssh works well with key authentication:
# ssh server
The difference between the two connections is that the dsh uses the FQDN, and the FQDN needs to be added to the known_hosts file for SSH. Therefore you must make an ssh connection first with FQDN to the host:
# ssh server.domain.com date
The authenticity of host server.domain.com can't be established.
RSA key fingerprint is 1b:b1:89:c0:63:d5:f1:f1:41:fa:38:14:d8:60:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added server.domain.com (RSA) 
to the list of known hosts.
Tue Sep  6 11:56:34 EDT 2011
Now try to use dsh again, and you'll see it will work:
# dsh -n server date
server.domain.com: Tue Sep  6 11:56:38 EDT 2011

Topics: AIX, Backup & restore, System Admin

Restoring individual files from a mksysb image

Sometimes, you just need that one single file from a mksysb image backup. It's really not that difficult to accomplish this.

First of all, go to the directory that contains the mksysb image file:

# cd /sysadm/iosbackup
In this example, were using the mksysb image of a Virtual I/O server, created using iosbackup. This is basically the same as a mksysb image from a regular AIX system. The image file for this mksysb backup is called vio1.mksysb

First, try to locate the file you're looking for; For example, if you're looking for file nimbck.ksh:
# restore -T -q -l -f vio1.mksysb | grep nimbck.ksh
New volume on vio1.mksysb:
Cluster size is 51200 bytes (100 blocks).
The volume number is 1.
The backup date is: Thu Jun  9 23:00:28 MST 2011
Files are backed up by name.
The user is padmin.
-rwxr-xr-x- 10   staff  May 23  08:37  1801 ./home/padmin/nimbck.ksh
Here you can see the original file was located in /home/padmin.

Now recover that one single file:
# restore -x -q -f vio1.mksysb ./home/padmin/nimbck.ksh
x ./home/padmin/nimbck.ksh
Note that it is important to add the dot before the filename that needs to be recovered. Otherwise it won't work. Your file is now restored to ./home/padmin/nimbck.ksh, which is a relative folder from the current directory you're in right now:
# cd ./home/padmin
# ls -als nimbck.ksh
4 -rwxr-xr-x    1 10  staff  1801 May 23 08:37 nimbck.ksh

Topics: AIX, Backup & restore, LVM, System Admin

Use dd to backup raw partition

The savevg command can be used to backup user volume groups. All logical volume information is archived, as well as JFS and JFS2 mounted filesystems. However, this command cannot be used to backup raw logical volumes.

Save the contents of a raw logical volume onto a file using:

# dd if=/dev/lvname of=/file/system/lvname.dd
This will create a copy of logical volume "lvname" to a file "lvname.dd" in file system /file/system. Make sure that wherever you write your output file to (in the example above to /file/system) has enough disk space available to hold a full copy of the logical volume. If the logical volume is 100 GB, you'll need 100 GB of file system space for the copy.

If you want to test how this works, you can create a logical volume with a file system on top of it, and create some files in that file system. Then unmount he filesystem, and use dd to copy the logical volume as described above.

Then, throw away the file system using "rmfs -r", and after that has been completed, recreate the logical volume and the file system. If you now mount the file system, you will see, that it is empty. Unmount the file system, and use the following dd command to restore your backup copy:
# dd if=/file/system/lvname.dd of=/dev/lvname
Then, mount the file system again, and you will see that the contents of the file system (the files you've placed in it) are back.

Topics: AIX, Hardware, System Admin

Identifying devices with usysident

There is a LED which you can turn on to identify a device, which can be useful if you need to replace a device. It's the same binary as being used by diag.

To show the syntax:

# /usr/lpp/diagnostics/bin/usysident ? 
usage: usysident [-s {normal | identify}] 
                 [-l location code | -d device name]
       usysident [-t] 
To check the LED status of the system:
# /usr/lpp/diagnostics/bin/usysident 
normal 
To check the LED status of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -d hdisk1 
normal
To activate the LED of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -s identify -d hdisk1 
# /usr/lpp/diagnostics/bin/usysident -d hdisk1 
identify
To turn of the LED again of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -s normal -d hdisk1 
# /usr/lpp/diagnostics/bin/usysident -d hdisk1 
normal
Keep in mind that activating the LED of a particular device does not activate the LED of the system panel. You can achieve that if you omit the device parameter.

Number of results found: 469.
Displaying results: 151 - 160.