Insert the product media for the same version and level as the current installation into the appropriate drive.
Power on the server.
Boot the server into SMS mode and Choose Start Maintenance Mode for System Recovery.
Select Access a Root Volume Group. A message displays explaining that you will not be able to return to the Installation menus without rebooting if you change the root volume group at this point.
Type the number of the appropriate volume group from the list.
Select Access this Volume Group and start a shell.
At the prompt, type the passwd command to reset the root password.
To write everything from the buffer to the hard disk and reboot the system, type the following:
The attribute "-I" (capital "i") for the df command can help you to show the actual used space within file systems, instead of giving you percentages with the regular df command:
Starting with AIX 5.3, you can use the following command to get the number of seconds since the UNIX EPOCH (January 1st, 1970):
# date +"%s"
On older AIX versions, or other UNIX operating systems, you may want to use the following command to get the same answer:
# perl -MPOSIX -le 'print time'
Getting this UNIX timestamp can be very useful when doing calculations with time stamps. If you need to convert a UNIX timestamp back to something readable:
A very usefull command to compary 2 files is sdiff. Let's say you want to compare the lslpp from 2 different hosts, then sdiff -s shows the differences between two files next to each other:
Date/Time: Sun May 17 22:11:46 PDT 2009
Sequence Number: 8539
Machine Id: 00GB214D4C00
Node Id: blahblah
Class: O
Type: INFO
Resource Name: RMCdaemon
Description
The default log file has been changed.
Probable Causes
The current default log file has been renamed and a new log file created.
Failure Causes
The current log file has become too large.
Recommended Actions
No action is required.
Detail Data
DETECTING MODULE
RSCT,rmcd_err.c,1.17,512
ERROR ID
6e0tBL/GsC28/gQH/ne1K//...................
REFERENCE CODE
File name
/var/ct/IW/log/mc/default
This error report entry refers to a file that was created, called /var/ct/IW/log/mc/default. Actually, when the file reaches 256 Kb, a new one is created, and the old one is renamed to default.last.
The following messages can be found in this file:
2610-217 Received 193 unrecognized messages in the last 10.183333 minutes.
Service is rmc.
This message more or less means:
"2610-217
Received count of unrecognized messages unrecognized messages in the last time minutes. Service is service_name.
Explanation:
The RMC daemon has received the specified number of unrecognized messages within the specified time interval. These messages were received on the UDP port, indicated by the specified service name, used for communication among RMC daemons. The most likely cause of this error is that this port number is being used by another application.
User Response:
Validate that the port number configured for use by the Resource Monitoring and Control daemon is only being used by the RMC daemon."
Check if something else is using the port of the RMC daemon:
# grep RMC /etc/services
rmc 657/tcp # RMC
rmc 657/udp # RMC
# lsof -i :657
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rmcd 1384574 root 3u IPv6 0xf35f20 0t0 UDP *:rmc
rmcd 1384574 root 14u IPv6 0xf2fd39 0t0 TCP *:rmc (LISTEN)
# netstat -Aan | grep 657
f1000600022fd398 tcp 0 0 *.657 *.* LISTEN
f10006000635f200 udp 0 0 *.657 *.*
The socket 0x22fd008 is being held by proccess 1384574 (rmcd).
No, it is actually the RMC daemon that is using this port, so this is fine.
Start an IP trace to find out who's transmitting to this port:
# startsrc -s iptrace -a "-b -p 657 /tmp/iptrace.bin"
To turn on PRM trace, on LPAR do:
# /usr/sbin/rsct/bin/rmctrace -s ctrmc -a PRM=100
Monitor /var/ct/3410054220/log/mc/default file on LPAR make sure
you see NEW errors for 2610-217 log after starting trace, may need to
wait for 10min (since every 10 minutes it logs one 2610-217 error entry).
To monitor default file, do:
# tail -f /var/ct/3410054220/log/mc/default
To stop iptrace, on LPAR do:
# stopsrc -s iptrace
To stop PRM trace, on LPAR do:
# /usr/sbin/rsct/bin/rmctrace -s ctrmc -a PRM=0
To format the iptraces, do:
# ipreport -rns /tmp/ipt > /tmp/ipreport.out
Collect ctsnap data, on LPAR do:
# ctsnap -x runrpttr
When analyzing the data you may find several nodeid's in the packets.
On HMC side, you can run: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc to find out if 22758085eb959fec was managed by HMC. You will need to have root access on the HMC to run this command. And you can get a temporary password from IBM to run with the pesh command as the hscpe user to get this root access.
This command will list the known managed systems to the HMC and their nodeid's.
Then, on the actual LPARs run /usr/sbin/rsct/bin/lsnodeid to determine the nodeid of that LPAR. If you find any discrepancies between the HMC listing of nodeid's and the nodeid's found on the LPAR's, then that is causing the errpt message to appear about the change of the log file.
To solve this, you have to recreate the RMC deamon databases on both the HMC and on the LPARs that have this issue:
On HMC side run:
# /usr/sbin/rsct/bin/rmcctrl -z
# /usr/sbin/rsct/bin/rmcctrl -A
# /usr/sbin/rsct/bin/rmcctrl -p
Then run /usr/sbin/rsct/install/bin/recfgct on the LPARs:
# /usr/sbin/rsct/install/bin/recfgct
0513-071 The ctcas Subsystem has been added.
0513-071 The ctrmc Subsystem has been added.
0513-059 The ctrmc Subsystem has been started.
Subsystem PID is 194568.
# /usr/sbin/rsct/bin/lsnodeid
6bcaadbe9dc8904f
Repeat this for every LPAR connected to the HMC.
After that, you can run on the HMC again:
By default, AIX will avoid logging duplicate errpt entries. You can see the default settings using smitty errdemon, which will be set to checking duplicate entries within a 10000 milliseconds time interval (10 seconds). Also, the default duplicate error maximum is set to 1000, so after 1000 duplicates, an additional entry will be made, depending on which is reached earlier, the duplicate time interval of 10 seconds or the duplicate error maximum.
How do you install the Linux Web Based System Manager (websm) client from an HMC version 3.3.6, if your only access to the system is through ssh? The following procedure can be used:
First, get the Linux websm software of the HMC to the Linux system:
# ssh -l hscroot hmc ls -als /usr/websm/pc_client/*
# cd /tmp
# scp hscroot@labhmc1:/usr/websm/pc_client/*linux* .
You can run invscout to do a microcode discovery on your system, that will generate a hostname.mup file. Then you go upload this hostname.mup file at this page on the IBM website and you get a nice overview of the status of all firmware on your system.
So far, so good. What if you have plenty of systems and you want to automate this? Here's a script to do this. This script first does a webget to collect the latest catalog.mic file from the IBM website. Then it distributes this catalog file to all the hosts you want to check. Then, it runs invscout on all these hosts, and collects the hostname.mup files. It will concatenate all these files into 1 large file and do an HTTP POST through curl to upload the file to the IBM website and have a report generated from it.
So, what do you need?
You should have an AIX jump server that allows you to access the other hosts as user root through SSH. So you should have setup your SSH keys for user root.
This jump server must have access to the Internet.
You need to have wget and curl installed. Get it from the Linux Toolbox.
Your servers should be AIX 5 or higher. It doesn't really work with AIX 4.
Optional: a web server, like Apache 2, would be nice, so you can drop the resulting HTML file on your website every day.
An entry in the root crontab to run this script every day.
A list of servers you want to check.
Here's the script:
#!/bin/ksh
# script: generate_survey.ksh
# purpose: To generate a microcode survey html file
# where is my list of servers located?
SERVERS=/usr/local/etc/servers
# what temporary folder will I use?
TEMP=/tmp/mup
# what is the invscout folder
INV=/var/adm/invscout
# what is the catalog.mic file location for invscout?
MIC=${INV}/microcode/catalog.mic
# if you have a webserver,
# where shall I put a copy of survey.html?
APA=/usr/local/apache2/htdocs
# who's the sender of the email?
FROM=microcode_survey@ibm.com
# who's the receiver of the email?
TO="your.email@address.com"
# what's the title of the email?
SUBJ="Microcode Survey"
# user check
USER=`whoami`
if [ "$USER" != "root" ];
then
echo "Only root can run this script."
exit 1;
fi
# create a temporary directory
rm -rf $TEMP 2>/dev/null
mkdir $TEMP 2>/dev/null
cd $TEMP
# get the latest catalog.mic file from IBM
# you need to have wget installed
# and accessible in $PATH
# you can download this on:
# www-03.ibm.com
# /systems/power/software/aix/linux/toolbox/download.html
wget techsupport.services.ibm.com/server/mdownload/catalog.mic
# You could also use curl here, e.g.:
#curl techsupport.services.ibm.com/server/mdownload/catalog.mic -LO
# move the catalog.mic file to this servers invscout directory
mv $TEMP/catalog.mic $MIC
# remove any old mup files
echo Remove any old mup files from hosts.
for server in `cat $SERVERS` ; do
echo "${server}"
ssh $server "rm -f $INV/*.mup"
done
# distribute this file to all other hosts
for server in `cat $SERVERS` ; do
echo "${server}"
scp -p $MIC $server:$MIC
done
# run invscout on all these hosts
# this will create a hostname.mup file
for server in `cat $SERVERS` ; do
echo "${server}"
ssh $server invscout
done
# collect the hostname.mup files
for server in `cat $SERVERS` ; do
echo "${server}"
scp -p $server:$INV/*.mup $TEMP
done
# concatenate all hostname.mup files to one file
cat ${TEMP}/*mup > ${TEMP}/muppet.$$
# delete all the hostname.mup files
rm $TEMP/*mup
# upload the remaining file to IBM.
# you need to have curl installed for this
# you can download this on:
# www-03.ibm.com
# /systems/power/software/aix/linux/toolbox/download.html
# you can install it like this:
# rpm -ihv
# curl-7.9.3-2.aix4.3.ppc.rpm curl-devel-7.9.3-2.aix4.3.ppc.rpm
# more info on using curl can be found on:
# http://curl.haxx.se/docs/httpscripting.html
# more info on uploading survey files can be found on:
# www14.software.ibm.com/webapp/set2/mds/fetch?pop=progUpload.html
# Sometimes, the IBM website will respond with an
# "Expectation Failed" error message. Loop the curl command until
# we get valid output.
stop="false"
while [ $stop = "false" ] ; do
curl -H Expect: -F mdsData=@${TEMP}/muppet.$$ -F sendfile="Upload file" \
http://www14.software.ibm.com/webapp/set2/mds/mds \
> ${TEMP}/survey.html
#
# Test if we see Expectation Failed in the output
#
unset mytest
mytest=`grep "Expectation Failed" ${TEMP}/survey.html`
if [ -z "${mytest}" ] ; then
stop="true"
fi
sleep 10
done
# now it is very useful to have an apache2 webserver running
# so you can access the survey file
mv $TEMP/survey.html $APA
# tip: put in the crontab daily like this:
# 45 9 * * * /usr/local/sbin/generate_survey.ksh 1>/dev/null 2>&1
# mail the output
# need to make sure this is sent in html format
cat - ${APA}/survey.html <<HERE | sendmail -oi -t
From: ${FROM}
To: ${TO}
Subject: ${SUBJ}
Mime-Version: 1.0
Content-type: text/html
Content-transfer-encoding: 8bit
HERE
# clean up the mess
cd /tmp
rm -rf $TEMP
Sudo is an excellent way to provide root access to commands to other non-root users, without giving them too much access to the system.
A very simple command to show you what a specific user is allowed to do:
# su - [username] -c sudo -l
User [username] may run the following commands on this host:
(root) NOPASSWD: /usr/local/sbin/reset.ksh
(root) NOPASSWD: /usr/local/bin/mkpasswd
(root) NOPASSWD: !/usr/local/bin/mkpasswd root
Number of results found: 469. Displaying results: 241 - 250.