Date/Time: Sun May 17 22:11:46 PDT 2009
Sequence Number: 8539
Machine Id: 00GB214D4C00
Node Id: blahblah
Class: O
Type: INFO
Resource Name: RMCdaemon
Description
The default log file has been changed.
Probable Causes
The current default log file has been renamed and a new log file created.
Failure Causes
The current log file has become too large.
Recommended Actions
No action is required.
Detail Data
DETECTING MODULE
RSCT,rmcd_err.c,1.17,512
ERROR ID
6e0tBL/GsC28/gQH/ne1K//...................
REFERENCE CODE
File name
/var/ct/IW/log/mc/default
This error report entry refers to a file that was created, called /var/ct/IW/log/mc/default. Actually, when the file reaches 256 Kb, a new one is created, and the old one is renamed to default.last.
The following messages can be found in this file:
2610-217 Received 193 unrecognized messages in the last 10.183333 minutes.
Service is rmc.
This message more or less means:
"2610-217
Received count of unrecognized messages unrecognized messages in the last time minutes. Service is service_name.
Explanation:
The RMC daemon has received the specified number of unrecognized messages within the specified time interval. These messages were received on the UDP port, indicated by the specified service name, used for communication among RMC daemons. The most likely cause of this error is that this port number is being used by another application.
User Response:
Validate that the port number configured for use by the Resource Monitoring and Control daemon is only being used by the RMC daemon."
Check if something else is using the port of the RMC daemon:
# grep RMC /etc/services
rmc 657/tcp # RMC
rmc 657/udp # RMC
# lsof -i :657
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rmcd 1384574 root 3u IPv6 0xf35f20 0t0 UDP *:rmc
rmcd 1384574 root 14u IPv6 0xf2fd39 0t0 TCP *:rmc (LISTEN)
# netstat -Aan | grep 657
f1000600022fd398 tcp 0 0 *.657 *.* LISTEN
f10006000635f200 udp 0 0 *.657 *.*
The socket 0x22fd008 is being held by proccess 1384574 (rmcd).
No, it is actually the RMC daemon that is using this port, so this is fine.
Start an IP trace to find out who's transmitting to this port:
# startsrc -s iptrace -a "-b -p 657 /tmp/iptrace.bin"
To turn on PRM trace, on LPAR do:
# /usr/sbin/rsct/bin/rmctrace -s ctrmc -a PRM=100
Monitor /var/ct/3410054220/log/mc/default file on LPAR make sure
you see NEW errors for 2610-217 log after starting trace, may need to
wait for 10min (since every 10 minutes it logs one 2610-217 error entry).
To monitor default file, do:
# tail -f /var/ct/3410054220/log/mc/default
To stop iptrace, on LPAR do:
# stopsrc -s iptrace
To stop PRM trace, on LPAR do:
# /usr/sbin/rsct/bin/rmctrace -s ctrmc -a PRM=0
To format the iptraces, do:
# ipreport -rns /tmp/ipt > /tmp/ipreport.out
Collect ctsnap data, on LPAR do:
# ctsnap -x runrpttr
When analyzing the data you may find several nodeid's in the packets.
On HMC side, you can run: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc to find out if 22758085eb959fec was managed by HMC. You will need to have root access on the HMC to run this command. And you can get a temporary password from IBM to run with the pesh command as the hscpe user to get this root access.
This command will list the known managed systems to the HMC and their nodeid's.
Then, on the actual LPARs run /usr/sbin/rsct/bin/lsnodeid to determine the nodeid of that LPAR. If you find any discrepancies between the HMC listing of nodeid's and the nodeid's found on the LPAR's, then that is causing the errpt message to appear about the change of the log file.
To solve this, you have to recreate the RMC deamon databases on both the HMC and on the LPARs that have this issue:
On HMC side run:
# /usr/sbin/rsct/bin/rmcctrl -z
# /usr/sbin/rsct/bin/rmcctrl -A
# /usr/sbin/rsct/bin/rmcctrl -p
Then run /usr/sbin/rsct/install/bin/recfgct on the LPARs:
# /usr/sbin/rsct/install/bin/recfgct
0513-071 The ctcas Subsystem has been added.
0513-071 The ctrmc Subsystem has been added.
0513-059 The ctrmc Subsystem has been started.
Subsystem PID is 194568.
# /usr/sbin/rsct/bin/lsnodeid
6bcaadbe9dc8904f
Repeat this for every LPAR connected to the HMC.
After that, you can run on the HMC again:
UNIX Health Check delivers software to scan Linux and AIX systems for potential issues. Run our software on your system, and receive a report in just a few minutes. UNIX Health Check is an automated check list. It will report on perfomance, capacity, stability and security issues. It will alert on configurations that can be improved per best practices, or items that should be improved per audit guidelines. A report will be generated in the format you wish, and the report includes the issues discovered and information on how to solve the issues as well.