Topics: AIX, System Admin

Cron jobs running late or not at all

If any of your cron jobs are running late, or not at all, check the cron log (/var/adm/cron/log) to see if there are any errors or other messages around the time the jobs should run.

If you see messages like this:

! c queue max run limit reached Fri Sep 20 13:15:00 2013 ! rescheduling a cron job Fri Sep 20 13:15:00 2013
The reason the jobs are not running is that there are too many simultaneous jobs at the time the daemon tries to run a new job.

The cron daemon has a limit of how many jobs it will run simultaneously. By default it is 100 jobs. If a new job is scheduled to run and the limit has already been reached the job will be rescheduled at a later time (the default is 60 seconds later). Both the number of jobs and wait time are configured in the file /var/adm/cron/queuedefs.

If it is unusual for cron to be running so many jobs, you can check the process table to view the jobs cron has created. These jobs will have parent process id (PPID) of the cron daemon.
# ps -ef | grep cron | grep -v grep
  root  2097204   1   0   Dec 02  -  0:33 /usr/sbin/cron

# ps -T 2097204
      PID    TTY  TIME CMD
  2097204      -  0:33 cron
 17760598      -  0:00     \--ksh
 18153488      -  0:16         \--find
In the example above the cron daemon has 1 child job, which is a shell, and that shell (possibly running a script) is running the "find" command. This would count as 1 direct descendent job from cron.

If you find many of the same job stuck there may be a problem with the script or command being run. The command or script should be checked from a shell prompt to see if it completes successfully.

If the large number of jobs are naturally occurring as a result of increased workload on the system, you may need to change the values in the queuedefs file and increase them from their defaults.

To do this, add an entry to the bottom of the queuedefs file using an editor such as vi. The entry should have the form:
c.50j20n60w
Where:
c = The "c" or cron queue
Nj = The maximum number of jobs to be run simultaneously by cron.
Nn = The "nice" value of the jobs to be run (default is 2).
Nw = The time a job has to wait until the next attempt to run it.

For example:
c200j2n60w
This example would set the cron queue to a maximum of 200 jobs, with a nice value of 2, and a wait time of 60 seconds.

It is not necessary to restart cron after modifying the queuedefs file, it will be automatically checked by cron's event loop.

Source: IBM Technote http://www-01.ibm.com/support/docview.wss?uid=isg3T1020382



If you found this useful, here's more on the same topic(s) in our blog:


UNIX Health Check delivers software to scan Linux and AIX systems for potential issues. Run our software on your system, and receive a report in just a few minutes. UNIX Health Check is an automated check list. It will report on perfomance, capacity, stability and security issues. It will alert on configurations that can be improved per best practices, or items that should be improved per audit guidelines. A report will be generated in the format you wish, and the report includes the issues discovered and information on how to solve the issues as well.

Interested in learning more?