Topics: AIX, System Admin

Cron jobs running late or not at all

If any of your cron jobs are running late, or not at all, check the cron log (/var/adm/cron/log) to see if there are any errors or other messages around the time the jobs should run.

If you see messages like this:

! c queue max run limit reached Fri Sep 20 13:15:00 2013 ! rescheduling a cron job Fri Sep 20 13:15:00 2013
The reason the jobs are not running is that there are too many simultaneous jobs at the time the daemon tries to run a new job.

The cron daemon has a limit of how many jobs it will run simultaneously. By default it is 100 jobs. If a new job is scheduled to run and the limit has already been reached the job will be rescheduled at a later time (the default is 60 seconds later). Both the number of jobs and wait time are configured in the file /var/adm/cron/queuedefs.

If it is unusual for cron to be running so many jobs, you can check the process table to view the jobs cron has created. These jobs will have parent process id (PPID) of the cron daemon.
# ps -ef | grep cron | grep -v grep
  root  2097204   1   0   Dec 02  -  0:33 /usr/sbin/cron

# ps -T 2097204
      PID    TTY  TIME CMD
  2097204      -  0:33 cron
 17760598      -  0:00     \--ksh
 18153488      -  0:16         \--find
In the example above the cron daemon has 1 child job, which is a shell, and that shell (possibly running a script) is running the "find" command. This would count as 1 direct descendent job from cron.

If you find many of the same job stuck there may be a problem with the script or command being run. The command or script should be checked from a shell prompt to see if it completes successfully.

If the large number of jobs are naturally occurring as a result of increased workload on the system, you may need to change the values in the queuedefs file and increase them from their defaults.

To do this, add an entry to the bottom of the queuedefs file using an editor such as vi. The entry should have the form:
c.50j20n60w
Where:
c = The "c" or cron queue
Nj = The maximum number of jobs to be run simultaneously by cron.
Nn = The "nice" value of the jobs to be run (default is 2).
Nw = The time a job has to wait until the next attempt to run it.

For example:
c200j2n60w
This example would set the cron queue to a maximum of 200 jobs, with a nice value of 2, and a wait time of 60 seconds.

It is not necessary to restart cron after modifying the queuedefs file, it will be automatically checked by cron's event loop.

Source: IBM Technote http://www-01.ibm.com/support/docview.wss?uid=isg3T1020382

Topics: Fun

Hard Reset tool


A very useful hard reset tool!

Topics: AIX, Monitoring, System Admin

NMON recordings

One can set up NMON recordings from smit via:

# smitty topas -> Start New Recording -> Start local recording -> nmon
However, the smit panel doesn't list the option needed to get disk IO service times. Specifically, the -d option to collect disk IO service and wait times. Thus, it's better to use the command line with the nmon command to collect and report these statistics. Here's one set of options for collecting the data:
# nmon -AdfKLMNOPVY^ -w 4 -s 300 -c 288 -m /var/adm/nmon
The key options here include:
  • -d Collect and report IO service time and wait time statistics.
  • -f Specifies that the output is in spreadsheet format. By default, the command takes 288 snapshots of system data with an interval of 300 seconds between each snapshot. The name of the output file is in the format of hostname_YYMMDD_HHMM.nmon.
  • -O Includes the Shared Ethernet adapter (SEA) VIOS sections in the recording file.
  • -V Includes the disk volume group section.
  • -^ Includes the FC adapter section (which also measures NPIV traffic on VIOS FC adapters).
  • -s Specifies the interval in seconds between 2 consecutive recording snapshots.
  • -c Specifies the number snapshots that must be taken by the command.
Running nmon using this command will ensure it runs for a full day. And it is therefore useful to start nmon daily using a crontab entry in the root crontab file. For example, using the following script:
# cat /usr/local/collect_nmon.ksh
#!/bin/ksh

LOGDIR="/var/adm/nmon"
PARAMS="-fTNAdKLMOPVY^ -w 4 -s 300 -c 288 -m $LOGDIR"

# LOGRET determines the number of days to retain nmon logs.
LOGRET=365

# Create the nmon folder.
if [ ! -d /var/adm/nmon ] ; then
        mkdir -p $LOGDIR
fi

# Compress previous daily log.
find $LOGDIR -name *.nmon -type f -mtime +1 -exec gzip '{}' \;

# Clean up old logs.
find $LOGDIR -name *nmon.gz -type f -mtime +$LOGRET -exec rm '{}' \;

# Start nmon.
/usr/bin/nmon $PARAMS
Then add the following crontab entry to the root crontab file:
0 0 * * * /usr/local/collect_nmon.ksh >/tmp/collect_nmon.ksh.log 2>&1
To get the recordings thru the NMON Analyser tool (a spreadsheet tool that runs on PCs and generates performance graphs, other output, and is available here), it's recommended to keep the number of intervals less than 300.

Topics: AIX, Security, System Admin

Avoid using env_reset in sudoers file

By default, when using sudo, the env_reset sudo option is enabled.

From the sudoers manual, about the env_reset sudo option:

This causes commands to be executed with a new, minimal environment. On AIX the environment is initialized with the contents of the /etc/environment file. The new environment contains the TERM, PATH, HOME, MAIL, SHELL, LOGNAME, USER, USERNAME and SUDO_* variables in addition to variables from the invoking process permitted by the env_check and env_keep options. This is effectively a whitelist for environment variables.

If, however, the env_reset option is disabled, any variables not explicitly denied by the env_check and env_delete options are inherited from the invoking process. In this case, env_check and env_delete behave like a blacklist. Since it is not possible to blacklist all potentially dangerous environment variables, use of the default env_reset behavior is encouraged.

In all cases, environment variables with a value beginning with () are removed as they could be interpreted as bash functions. The list of environment variables that sudo allows or denies is contained in the output of "sudo -V" when run as root.


So, what does this all mean? Well, it means that you should not use env_reset in the /etc/sudoers file.

First of all, if you would use:

Defaults env_reset
Then that would do you no good, because the default is already to reset the environment variables.

If you would use (notice the exclamation mark before env_reset):
Defaults !env_reset
Then it means you don't reset any environment variables from the invoking process, for ALL users. That is a security risk, as sudo will preserve variables such as PATH or LD_LIBRARY, and these variables can be configured with values such as "." or "/home/username", or they can be utilized by malicious software.

With the default env_reset all sudo sessions will invoke a shell with minimum shell variables, including those set in /etc/profile and some others if specified in sudoers file (using the env_keep option). So this will make a more controlled sudo access without bypassing sudo security restrictions.

Okay, so what if you need to run a command through sudo that requires a certain environment variable? A good example is the tcpdump command. When running tcpdump via sudo, you may encounter the following error message:
$ sudo tcpdump -i en12
tcpdump: bpf_load: genmajor failed: A file or directory in the path name does not exist.
In this case, tcpdump is known to require the ODMDIR environment variable to be set. One way is to use "Defaults !env_reset" in /etc/sudoers, but the sudoers manual above explains that this is discouraged. Another method is to allow only specific users in /etc/sudoers, by disabling env_reset, such as:
User_Alias           UTCPDUMP = tim, john
Defaults:UTCPDUMP    !env_reset
But this still allows specific users to "play" with all environment variables. So unless you trust these users very much, an even better way is to use the env_keep sudo option, to specify the environment variables that need not be reset (that is, if you know the correct environment variables that are required). In the case of the tcpdump command, we will want to retain the ODMDIR environment variable:
Defaults env_keep += ODMDIR
With the above line in /etc/sudoers, you will notice that running the tcpdump command via sudo will now work properly.

So, the bottom line is: Don't use env_reset at all in /etc/sudoers. If really necessary, use env_reset for only specific users, or even better, specify the required environment variables using env_keep.

Of course, the UNIX Health Check software will check if env_reset is used in /etc/sudoers, and if so, warn about this potential security risk.

Topics: AIX, Security, System Admin

Difference between sticky bit and SUID/GUID

This is probably one of things that people mess up all the time. They both have to do with permissions on a file, but the SUID/GUID (or SETUID short for set-user-id/SETGID short for set-group-id) bit and the sticky-bit are 2 completely different things.

The SUID/GUID

The letters rwxXst select file mode bits for users:

  • read (r)
  • write (w)
  • execute (or search for directories) (x)
  • execute/search only if the file is a directory or already has execute permission for some user (X)
  • set user or group ID on execution (s)
  • restricted deletion flag or sticky bit (t)
The position that the x bit takes in rwxrwxrwx for the user octet (1st group of rwx) and the group octet (2nd group of rwx) can take an additional state where the x becomes an s. When this file when executed (if it's a program and not just a shell script), it will run with the permissions of the owner or the group of the file. That is called the SUID, when set for the user octet, and GUID, when set for the group octet.

So if the file is owned by root and the SUID bit is turned on, the program will run as root. Even if you execute it. The same thing applies to the GUID bit. You can set or clear the bits with symbolic modes like u+s and g-s, and you can set (but not clear) the bits with a numeric mode.

SUID/GUID examples

No SUID/GUID: Just the bits rwxr-xr-x are set:
# ls -lt test.pl -rwxr-xr-x 1 root root 179 Jan 9 01:01 test.pl
SUID and user's executable bit enabled (lowercase s): The bits rwsr-x-r-x are set.
# chmod u+s test.pl
# ls -lt test.pl
-rwsr-xr-x 1 root root 179 Jan  9 01:01 test.pl
SUID enabled and executable bit disabled (uppercase S): The bits rwSr-xr-x are set.
# chmod u-x test.pl
# ls -lt test.pl 
-rwSr-xr-x 1 root root 179 Jan  9 01:01 test.pl
GUID and group's executable bit enabled (lowercase s): The bits rwxr-sr-x are set.
# chmod g+s test.pl
# ls -lt test.pl 
-rwxr-sr-x 1 root root 179 Jan  9 01:01 test.pl
GUID enabled and executable bit disabled (uppercase S): The bits rwxr-Sr-x are set.
# chmod g-x test.pl
# ls -lt test.pl 
-rwxr-Sr-x 1 root root 179 Jan  9 01:01 test.pl
The sticky bit

The sticky bit on the other hand is denoted as a t, such as with the /tmp or /var/tmp directories:
# ls -ald /tmp
drwxrwxrwt 36 bin bin 8192 Nov 27 08:40 /tmp
# ls -ald /var/tmp
drwxrwxrwt  3 bin bin  256 Nov 27 08:28 /var/tmp
This bit should have always been called the "restricted deletion bit" given that's what it really denotes. When this mode bit is enabled, it makes a directory such that users can only delete files and directories within it that they are the owners of. For regular files the bit was used to save the program in swap device so that the program would load more quickly when run; this is called the sticky bit, but it's not used anymore in AIX.

More information can be found in the manual page of the chmod command or on http://en.wikipedia.org/wiki/Sticky_bit.

Topics: AIX, Security, System Admin

Generating random passwords

When you set up a new user account, and assign a password to that account, you'll want to make sure that it is a password that can not be easily guessed. Setting the initial password to something easy like "changeme", only allows hackers easy access to your system.

So the best way you can do this, is by generating a fully random password. That can easily be achieved by using the /dev/urandom device.

Here's an easy command to generate a random password:

# dd if=/dev/urandom bs=16 count=1 2>/dev/null | openssl base64 | sed "s/[=O/\]//g" | cut -b1-8
This will create passwords like:
ej9yTaaD
Ux9FYusx
QR0TSAZC
...

Topics: AIX, Security, System Admin

Fix user accounts

Security guidelines nowadays can be annoying. Within many companies people have to comply with strict security in regards to password expiration settings, password complexity and system security settings. All these settings and regulations more than often result in people getting locked out from their accounts on AIX systems, and also getting frustrated at the same time.

To help your users, you can't go change default security settings on the AIX systems. Your auditor will make sure you won't do that. But instead, there are some "tricks" you can use, to ensure that a user account is (and stays) available to your end user. We've put all those tricks together in one simple script, that can fix a user account, and we called it fixuser.ksh. It will fix 99% of all user related login issues.

You can run this script as often as you like and for any user that you like. It will help you to ensure that a user account is not locked, that AIX won't bug the user to change their password, that the user doesn't have a failed login count (from typing too many passwords), and a bunch of other stuff that usually will keep your users from logging in and getting pesky "Access Denied" messages.

The script will not alter any default security settings, and it can easily be adjusted to run for several user accounts, or can be run from a crontab so user accounts stay enabled for your users. The script is a win-win situation for everyone: Your auditor is happy, because security settings are strict on your system; Your users are happy for being able to just login without any hassle; And the sys admin will be happy for not having to resolve login issues manually anymore.

The script can be run by entering a specific user account:

# fixuser.ksh username
The script:
#!/usr/bin/ksh

fixit()
{

  myid=${1}

  # Unlock account
  printf "Unlocking account for ${user}..."
  chuser account_locked=false ${user}
  echo " Done."

  # Reset failed login count
  printf "Reset failed login count for ${user}..."
  chuser unsuccessful_login_count=0 ${user}
  echo " Done."

  # Reset expiration date
  printf "Reset expiration date for ${user}..."
  chuser expires=0 ${user}
  echo " Done."

  # Allow the user to login
  printf "Enable login for ${user}..."
  chuser login=true ${user}
  echo " Done."

  # Allow the user to login remotely
  printf "Enable remote login for ${user}..."
  chuser rlogin=true ${user}
  echo " Done."

  # Reset maxage
  printf "Reset the maxage for ${user}..."
  m=`lssec -f /etc/security/user -s default -a maxage | cut -f2 -d=`
  chuser maxage=${m} ${user}
  echo " Done."

  # Clear password change requirement
  printf "Clear password change requirement for ${user}..."
  pwdadm -c ${user}
  echo " Done."

  # Reset password last update
  printf "Reset the password last update for ${user}..."
  let sinceepoch=`perl -e 'printf(time)' | awk '{print $1}'`
  n=`lssec -f /etc/security/user -s default -a minage | cut -f2 -d=`
  let myminsecs="${n}*7*24*60*60"
  let myminsecs="${myminsecs}+1000"
  let newdate="${sinceepoch}-${myminsecs}"
  chsec -f /etc/security/passwd -s ${user} -a lastupdate=${newdate}
  echo " Done."
}

unset user

if [ ! -z "${1}" ] ; then
  user=${1}
fi

# If a username is provided, fix that user account

unset myid
myid=`id ${user} 2>/dev/null`
if [ ! -z "${myid}" ] ; then
  echo "Fixing account ${user}..."
  fixit ${user}
  printf "Remove password history..."
  cp /dev/null /etc/security/pwdhist.pag 2>/dev/null
  cp /dev/null /etc/security/pwdhist.dir 2>/dev/null
  echo " Done."
else
  echo "User ${user} does not exist."
fi

Topics: AIX, Security, System Admin

Clearing password history

Sometimes when password rules are very strict, a user may have problems creating a new password that is both easy to remember, and still adheres to the password rules. To aid the user, it could be useful to clear the password history for his or her account, so he or she can re-use a certain password that has been used in the past. The password history is stored in /etc/security/pwdhist.pag and /etc/security/pwdhist.dir. The command you can use to disable the password history for a user is:

# chuser histsize=0 username
Actually, this command does not the password history in /etc/security/pwdhist.dir and /etc/security/pwdhist.pag, but only changes the setting of histsize for the account to zero, meaning, that a user is not checked again on re-using old passwords. After the user has changed his or her password, you may want to set it back again to the default value:
# grep -p ^default /etc/security/user | grep histsize
        histsize = 20
# chuser histsize=20 username
In older AIX levels, this functionality (to use chuser histsize=0) would actually have cleared out the password history of the user. In later AIX levels, this functionality has vanished.

So, if you truely wish to delete the password history for a user, here's another way to clear the password history on a system: It is accomplished by zeroing out the pwdhist.pag and pwdhist.dir files. However, this results in the deletion of all password history for all users on the system:
# cp /dev/null /etc/security/pwdhist.pag
# cp /dev/null /etc/security/pwdhist.dir
Please note that his is a temporary measure. Once these files are zeroed out, as soon as a user changes his or her password again, the old password is stored again in these files and it can't be reused (unless the histsize attribute for a user is set to 0).

Topics: AIX, Monitoring, System Admin

Boxes and lines in NMON

Usually, with the default settings used with NMON, along with using PuTTY on a Windows system, you may notice that the boxes and lines in NMON are not displayed correctly. It may look something like this:



An easy fix for this issue is to change the character set translation within PuTTY. In the upper left corner of your PuTTY window, click the icon and select "Change Settings". Then navigate to Window -> Translation. In the "Remote character set" field, change "UTF-8" to "ISO-8859-1".



Once changed, restart PuTTY and it should something like this:



Another option is to stop using boxes and lines altogether. You can do this by starting nmon with the -B option:

# nmon -B
Or you can set the NMON environment variable to the same:
# export NMON=B
# nmon

Topics: Red Hat / Linux, System Admin

Adding swap space to RHEL

Here's a procedure how you can add additional swap space to a running RHEL system.

This procedure assumes you will want to add 8 Gigabytes of swap space, and we will be using LVM to do so. To get information from Red Hat on recommended swap space sizes, take a look here: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/ch-swapspace.html.

First start by checking what the current swap space size is, by using the free command:

# free -m -t
             total       used       free     shared    buffers     cached
Mem:        129013     124325       4688          9        173      97460
-/+ buffers/cache:      26691     102322
Swap:        16383       8057       8326
Total:      145397     132382      13015
This particular system has 16 GB of swap space (look in the "total" column next to "Swap:"). Using the -m option with the free command displays the memory values in megabytes. Using the -t option will provide the totals.

You can also see that the system has used 8057 MB of it's swap space, almost half of the swap space available.

Then, figure out how the current swap spaces are configured now:
# cat /proc/swaps
Filename                         Type            Size    Used    Priority
/dev/dm-1                        partition       8388604 8262740 -1
/dev/dm-8                        partition       8388604 0       -2
This shows that there are 2 paging spaces of 8 GB each. To increase the swap space on the system, we'll add another swap space of 8 GB, so the total swap space will go up to 24 GB.

To get a view of what logical volumes exist on the system, use the dmsetup command:
# dmsetup ls
rootvg00-optlv00        (253:7)
rootvg00-tmplv00        (253:3)
rootvg00-varlv00        (253:2)
rootvg00-homelv00       (253:6)
rootvg00-rootlv00       (253:0)
rootvg00-usrlocallv00   (253:5)
rootvg00-swaplv01       (253:8)
rootvg00-usrlv00        (253:4)
rootvg00-swaplv00       (253:1)
This shows that there are 2 logical volumes, swaplv00, and swaplv01. We'll create swaplv02 as the third swap space on the system.

Another good way to see the same information, is by using the lvs command:
# lvs 2>/dev/null
  LV           VG       Attr       LSize
  homelv00     rootvg00 -wi-ao---- 10.00g
  optlv00      rootvg00 -wi-ao----  8.00g
  rootlv00     rootvg00 -wi-ao----  2.00g
  swaplv00     rootvg00 -wi-ao----  8.00g
  swaplv01     rootvg00 -wi-ao----  8.00g
  tmplv00      rootvg00 -wi-ao----  5.00g
  usrlocallv00 rootvg00 -wi-ao----  1.00g
  usrlv00      rootvg00 -wi-ao----  5.00g
  varlv00      rootvg00 -wi-ao----  4.00g
This gives you the information that the logical volumes have been created in the rootvg00 volume group. We'll create the new swap space in the same volume group, using the lvcreate command:
# lvcreate -n swaplv02 -L 8G rootvg00
  Logical volume "swaplv02" created
Using the -n option of the lvcreate command, you can specify the name of the logical volume. The -L option specifies the size (in this case 8G), and you end the command with the volume group name.

Next, you'll have to tell RHEL that the new logical volume is to be formatted for swap space usage:
# mkswap /dev/rootvg00/swaplv02
Setting up swapspace version 1, size = 8388604 KiB
no label, UUID=c9be43f7-c473-45ae-ba13-c1e09af2d95e
Then, you'll have to add an entry to /etc/fstab, so the system knows to re-use the swap space after a system reboot:
# grep swap /etc/fstab
/dev/mapper/rootvg00-swaplv00 swap     swap    defaults        0 0
/dev/mapper/rootvg00-swaplv01 swap     swap    defaults        0 0
/dev/mapper/rootvg00-swaplv02 swap     swap    defaults        0 0
Finally, activate the new swap space using the swapon command:
# swapon -v /dev/rootvg00/swaplv02 swapon on /dev/rootvg00/swaplv02 swapon: /dev/mapper/rootvg00-swaplv02: found swap signature: version 1, page-size 4, same byte order swapon: /dev/mapper/rootvg00-swaplv02: pagesize=4096, swapsize=8589934592, devsize=8589934592
To validate that the new swap space is available on the system, use the free command again, and you may also review /proc/swaps:
# free -m -t
             total       used       free     shared    buffers     cached
Mem:        129013     121344       7669          9        175      95575
-/+ buffers/cache:      25593     103420
Swap:        24575       8109      16466
Total:      153589     129453      24136
# cat /proc/swaps
Filename                         Type            Size    Used    Priority
/dev/dm-1                        partition       8388604 8303856 -1
/dev/dm-8                        partition       8388604 0       -2
/dev/dm-9                        partition       8388604 0       -3
That's it; you're done!

Number of results found: 469.
Displaying results: 111 - 120.