Sometimes, e.g. after network related changes, it may be necessary to reset the iDRAC. If the iDRAC is no longer available, or if it is not responding, then it would be very difficult to reset the iDRAC at this point.
As an alternative, one can reset the iDRAC from the OS using the following command:
# racadm racreset
If you wish to determine how much memory is installed in a Linux system, or perhaps the maximum amount of memory configurable on a system and the exact number and size of the memory DIMMs installed in the system, then you should use the dmidecode command.
The dmidecode command has a type option ( -t ), that can be used to indicate the type of device you wish to see detailed information for, like bios, system, baseboard, chassis, processor, cache, connector, slot and ... memory.
To retrieve the memory information, run the following command:
# dmidecode -t memory
You should get an output similar like the one below, but it can obviously differ per Linux system, depending on the hardware (model) and the installed memory.
# dmidecode -t memory
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
Handle 0x0047, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 16 GB
Error Information Handle: Not Provided
Number Of Devices: 2
Handle 0x0048, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0047
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 4096 MB
Form Factor: SODIMM
Set: None
Locator: DIMM A
Bank Locator: Not Specified
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MT/s
Manufacturer: Hynix/Hyundai
Serial Number: 3248A01B
Asset Tag: 9876543210
Part Number: HMT351S6CFR8C-PB
Rank: 2
Configured Memory Speed: 1600 MT/s
Handle 0x004A, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0047
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 4096 MB
Form Factor: SODIMM
Set: None
Locator: DIMM B
Bank Locator: Not Specified
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MT/s
Manufacturer: Kingston
Serial Number: BA33020F
Asset Tag: 9876543210
Part Number: KFYHV1-HYC
Rank: 2
Configured Memory Speed: 1600 MT/s
In the output above you can see that this particular system has a "Maximum Capacity" of 16 GB, and up to two "Number of Devices" can be installed.
Below it, you see the currently installed Memory Devices, the first one with a size of 4096 MB in slot DIMM A, with the exact Manufacturer and the Part Number listed. In slot DIMM B, you can also see another DIMM of 4096 MB installed, which is of a different vendor.
This information tells you, that the system shown above can be configured up to 16 GB of memory, but currently has two 4 GB memory DIMMs, thus 8 GB installed. In this case, upgrading the system to 16 GB of memory would mean replacing the two 4 GB memory DIMMs with two 8 GB memory DIMMs.
There are a few items that can be very useful to install within Red Hat, if used on Dell hardware. These are the OpenManage System Administrator tools that will provide you more information on the Dell hardware, Dell System Updater or DSU, that can be used to update firmware and BIOS on Dell hardware, and the Dell iDRAC Service Module that allows the iDRAC to exchange information with the Operating System.
First, set up the Dell Linux repository:
# curl -s http://linux.dell.com/repo/hardware/dsu/bootstrap.cgi | bash
Next, install OpenManage System Administrator, or OMSA, and make sure to start it, and enable it at boot time:
# yum -y install srvadmin-all
# /opt/dell/srvadmin/sbin/srvadmin-services.sh start
# /opt/dell/srvadmin/sbin/srvadmin-services.sh enable
With OMSA installed, you can, for example, now retrieve information about the physical disks in the system, and also the virtual disks (or RAID arrays) configured on these physical disks. Here's how you can use the command line interface tools to look up this information:
List the controllers available in the system:
# /opt/dell/srvadmin/bin/omreport storage controller
Now, list the physical disks (or pdisks) for each controller, for example for the controller with ID 0:
# /opt/dell/srvadmin/bin/omreport storage pdisk controller=0
And you can list the virtual disks (or pdisks) for each controller, for example for the controller with ID 0:
# /opt/dell/srvadmin/bin/omreport storage vdisk controller=X
A lot more is possible with OMSA, but that's outside the scope of this article. Instead, let's move on with the items to install.
Install Dell System Update (or DSU):
# yum -y install dell-system-update
To update firmware, you can now run:
# dsu
Usually it's just fine to select all firmware items to update (by pressing "a") and have it updated (by pressing "c"). This may take a while, and may require a reboot of the system. Upon reboot, the system may also take a while to complete the firmware and/or BIOS updates.
Finally, the Dell iDRAC service module. The latest version (at time of writing this article) can be found here:
https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=GH8R3. Copy the download link on this page of the GNU ZIP file.
The Dell Service Module requires the usbutils package to be installed:
# yum -y install usbutils
Now you can download and install the Dell iDRAC Service Module:
# mkdir /tmp/dsm
# cd /tmp/dsm
# wget https://downloads.dell.com/FOLDER05038177M/1/OM-iSM-Dell-Web-LX-320-1234_A00.tar.gz
# gzip -d *z
# tar xf *tar
# ./setup.sh
Here it's best to select all features and hit "i" to install. Keep everything at default settings and answer "yes" to any other questions. After installation is completed, you can log in to the iDRAC of the system, and view Operating System information there. This information has been communicated from the OS to the iDRAC by the Dell iDRAC Service Module.
At some times it may be necessary to create a dummy disk device, for example when you need a disk to be discovered while running cfgmgr with a certain name on multiple hosts.
For example, if you need the disk to be called hdisk2, and only hdisk0 exists on the system, then running cfgmgr will discover the disk as hdisk1, not as hdisk2. In order to make sure cfgmgr indeed discovers the new disk as hdisk2, you can fool the system by temporarily creating a dummy disk device.
Here are the steps involved:
First: remove the newly discovered disk (in the example below known as hdisk1 - we will configure this disk as hdisk2):
# rmdev -dl hdisk1
Next, we create a dummy disk device with the name hdisk1:
# mkdev -l hdisk1 -p dummy -c disk -t hdisk -w 0000
Note that running the command above may result in an error. However, if you run the following command afterwards, you will notice that the dummy disk device indeed has been created:
# lsdev -Cc disk | grep hdisk1
hdisk1 Defined SSA Logical Disk Drive
Also note that the dummy disk device will not show up if you run the lspv command. That is no concern.
Now run the cfgmgr command to discover the new disk. You'll notice that the new disk will now be discovered as hdisk2, because hdisk0 and hdisk1 are already in use.
# cfgmgr
# lsdev -Cc disk | grep hdisk2
Finally, remove the dummy disk device:
# rmdev -dl hdisk1
There is a LED which you can turn on to identify a device, which can be useful if you need to replace a device. It's the same binary as being used by diag.
To show the syntax:
# /usr/lpp/diagnostics/bin/usysident ?
usage: usysident [-s {normal | identify}]
[-l location code | -d device name]
usysident [-t]
To check the LED status of the system:
# /usr/lpp/diagnostics/bin/usysident
normal
To check the LED status of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -d hdisk1
normal
To activate the LED of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -s identify -d hdisk1
# /usr/lpp/diagnostics/bin/usysident -d hdisk1
identify
To turn of the LED again of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -s normal -d hdisk1
# /usr/lpp/diagnostics/bin/usysident -d hdisk1
normal
Keep in mind that activating the LED of a particular device does not activate
the LED of the system panel. You can achieve that if you omit the device parameter.
You can run invscout to do a microcode discovery on your system, that will generate a hostname.mup file. Then you go upload this hostname.mup file at this page on the IBM website and you get a nice overview of the status of all firmware on your system.
So far, so good. What if you have plenty of systems and you want to automate this? Here's a script to do this. This script first does a webget to collect the latest catalog.mic file from the IBM website. Then it distributes this catalog file to all the hosts you want to check. Then, it runs invscout on all these hosts, and collects the hostname.mup files. It will concatenate all these files into 1 large file and do an HTTP POST through curl to upload the file to the IBM website and have a report generated from it.
So, what do you need?
- You should have an AIX jump server that allows you to access the other hosts as user root through SSH. So you should have setup your SSH keys for user root.
- This jump server must have access to the Internet.
- You need to have wget and curl installed. Get it from the Linux Toolbox.
- Your servers should be AIX 5 or higher. It doesn't really work with AIX 4.
- Optional: a web server, like Apache 2, would be nice, so you can drop the resulting HTML file on your website every day.
- An entry in the root crontab to run this script every day.
- A list of servers you want to check.
Here's the script:
#!/bin/ksh
# script: generate_survey.ksh
# purpose: To generate a microcode survey html file
# where is my list of servers located?
SERVERS=/usr/local/etc/servers
# what temporary folder will I use?
TEMP=/tmp/mup
# what is the invscout folder
INV=/var/adm/invscout
# what is the catalog.mic file location for invscout?
MIC=${INV}/microcode/catalog.mic
# if you have a webserver,
# where shall I put a copy of survey.html?
APA=/usr/local/apache2/htdocs
# who's the sender of the email?
FROM=microcode_survey@ibm.com
# who's the receiver of the email?
TO="your.email@address.com"
# what's the title of the email?
SUBJ="Microcode Survey"
# user check
USER=`whoami`
if [ "$USER" != "root" ];
then
echo "Only root can run this script."
exit 1;
fi
# create a temporary directory
rm -rf $TEMP 2>/dev/null
mkdir $TEMP 2>/dev/null
cd $TEMP
# get the latest catalog.mic file from IBM
# you need to have wget installed
# and accessible in $PATH
# you can download this on:
# www-03.ibm.com
# /systems/power/software/aix/linux/toolbox/download.html
wget techsupport.services.ibm.com/server/mdownload/catalog.mic
# You could also use curl here, e.g.:
#curl techsupport.services.ibm.com/server/mdownload/catalog.mic -LO
# move the catalog.mic file to this servers invscout directory
mv $TEMP/catalog.mic $MIC
# remove any old mup files
echo Remove any old mup files from hosts.
for server in `cat $SERVERS` ; do
echo "${server}"
ssh $server "rm -f $INV/*.mup"
done
# distribute this file to all other hosts
for server in `cat $SERVERS` ; do
echo "${server}"
scp -p $MIC $server:$MIC
done
# run invscout on all these hosts
# this will create a hostname.mup file
for server in `cat $SERVERS` ; do
echo "${server}"
ssh $server invscout
done
# collect the hostname.mup files
for server in `cat $SERVERS` ; do
echo "${server}"
scp -p $server:$INV/*.mup $TEMP
done
# concatenate all hostname.mup files to one file
cat ${TEMP}/*mup > ${TEMP}/muppet.$$
# delete all the hostname.mup files
rm $TEMP/*mup
# upload the remaining file to IBM.
# you need to have curl installed for this
# you can download this on:
# www-03.ibm.com
# /systems/power/software/aix/linux/toolbox/download.html
# you can install it like this:
# rpm -ihv
# curl-7.9.3-2.aix4.3.ppc.rpm curl-devel-7.9.3-2.aix4.3.ppc.rpm
# more info on using curl can be found on:
# http://curl.haxx.se/docs/httpscripting.html
# more info on uploading survey files can be found on:
# www14.software.ibm.com/webapp/set2/mds/fetch?pop=progUpload.html
# Sometimes, the IBM website will respond with an
# "Expectation Failed" error message. Loop the curl command until
# we get valid output.
stop="false"
while [ $stop = "false" ] ; do
curl -H Expect: -F mdsData=@${TEMP}/muppet.$$ -F sendfile="Upload file" \
http://www14.software.ibm.com/webapp/set2/mds/mds \
> ${TEMP}/survey.html
#
# Test if we see Expectation Failed in the output
#
unset mytest
mytest=`grep "Expectation Failed" ${TEMP}/survey.html`
if [ -z "${mytest}" ] ; then
stop="true"
fi
sleep 10
done
# now it is very useful to have an apache2 webserver running
# so you can access the survey file
mv $TEMP/survey.html $APA
# tip: put in the crontab daily like this:
# 45 9 * * * /usr/local/sbin/generate_survey.ksh 1>/dev/null 2>&1
# mail the output
# need to make sure this is sent in html format
cat - ${APA}/survey.html <<HERE | sendmail -oi -t
From: ${FROM}
To: ${TO}
Subject: ${SUBJ}
Mime-Version: 1.0
Content-type: text/html
Content-transfer-encoding: 8bit
HERE
# clean up the mess
cd /tmp
rm -rf $TEMP
The "Integrated Virtual Ethernet" or IVE adapter is an adapter directly on the GX+ bus, and thus up to 3 times faster dan a regular PCI card. You can order Power6 frames with different kinds of IVE adapters, up to 10GB ports.
The IVE adapter acts as a layer-2 switch. You can create port groups. In each port group up to 16 logical ports can be defined. Every port group requires at least 1 physical port (but 2 is also possible). Each logical port can have a MAC address assigned. These MAC addresses are located in the VPD chip of the IVE. When you replace an IVE adapters, LPARS will get new new MAC addresses.
Each LPAR can only use 1 logical port per physical port. Different LPARs that use logical ports from the same port group can communicate without any external hardware needed, and thus communicate very fast.
The IVE is not hot-swappable. It can and may only be replaced by certified IBM service personnel.
First you need to configure an HAE adapter; not in promiscues mode, because that is meant to be used if you wish to assign a physical port dedicated to an LPAR. After that, you need to assign a LHAE (logical host ethernet adapter) to an LPAR. The HAE needs to be configured, and the frame needs to be restarted, in order to function correctly (because of the setting of multi-core scaling on the HAE itself).
So, to conclude: You can assign physical ports of the IVE adapter to separate LPARS (promiscues mode). If you have an IVE with two ports, up to two LPARS can use these ports. But you can also configure it as an HAE and have up to 16 LPARS per physical port in a port group using the same interface (10Gb ports are recommended). There are different kinds of IVE adapters; some allow to create more port groups and thus more network connectivity. The IVE is a method of virtualizing ethernet without the need for VIOS.
An adapter that has previously been added to a LPAR and now needs to be removed, usually doesn't want to be removed from the LPAR, because it is in use by the LPAR. Here's how you find and remove the involved devices on the LPAR:
First, run:
# lsslot -c pci
This will find the adapter involved.
Then, find the parent device of a slot, by running:
# lsdev -Cl [adapter] -F parent
(Fill in the correct adapter, e.g. fcs0).
Now, remove the parent device and all its children:
# rmdev -Rl [parentdevice] -d
For example:
# rmdev -Rl pci8 -d
Now you should be able to remove the adapter via the HMC from the LPAR.
If you need to replace the adapter because it is broken and needs to be replaced, then you need to power down the PCI slot in which the adapter is placed:
After issuing the "rmdev" command, run
diag and go into "Task Selection", "Hot Plug Task", "PCI Hot Plug Manager", "Replace/Remove a PCI Hot Plug Adapter". Select the adapter and choose "remove".
After the adapter has been replaced (usually by an IBM technician), run
cfgmgr again to make the adapter known to the LPAR.
Ever noticed the different colors on parts of Power5 systems? Some parts are orange, others are blue. Orange means: you can touch it, open it, remove it, even if the system is running. Blue means: don't touch it if the system is active.
This is a procedure how to replace a failing HBA or fibre channel adapter, when used in combination with SDD storage:
- Determine which adapter is failing (0, 1, 2, etcetera):
# datapath query adapter
- Check if there are dead paths for any vpaths:
# datapath query device
- Try to set a "degraded" adapter back to online using:
# datapath set adapter 1 offline
# datapath set adapter 1 online
(that is, if adapter "1" is failing, replace it with the correct adapter number).
- If the adapter is still in a "degraded" status, open a call with IBM. They most likely require you to take a snap from the system, and send the snap file to IBM for them to analyze and they will conclude if the adapter needs to be replaced or not.
- Involve the SAN storage team if the adapter needs to be replaced. They will have to update the WWN of the failing adapter when the adapter is replaced for a new one with a new WWN.
- If the adapter needs to be replaced, wait for the IBM CE to be onsite with the new HBA adapter. Note the new WWN and supply that to the SAN storage team.
- Remove the adapter:
# datapath remove adapter 1
(replace the "1" with the correct adapter that is failing).
- Check if the vpaths now all have one less path:
# datapath query device | more
- De-configure the adapter (this will also de-configure all the child devices, so you won't have to do this manually), by running: diag, choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Unconfigure a Device. Select the correct adapter, e.g. fcs1, set "Unconfigure any Child Devices" to "yes", and "KEEP definition in database" to "no". Hit ENTER.
- Replace the adapter: Run diag and choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Replace/Remove a PCI Hot Plug Adapter. Choose the correct device (be careful, you won't see the adapter name here, but only "Unknown", because the device was unconfigured).
- Have the IBM CE replace the adapter.
- Close any events on the failing adapter on the HMC.
- Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention LED.
- Check the adapter firmware level using:
# lscfg -vl fcs1
(replace this with the actual adapter name).
And if required, update the adapter firmware microcode. Validate if the adapter is still functioning correctly by running: # errpt
# lsdev -Cc adapter
- Have the SAN admin update the WWN.
- Run:
# cfgmgr -S
- Check the adapter and the child devices:
# lsdev -Cc adapter# lsdev -p fcs1
# lsdev -p fscsi1
(replace this with the correct adapter name).
- Add the paths to the device:
# addpaths
- Check if the vpaths have all paths again:
# datapath query device | more
Number of results found for topic
Hardware: 12.
Displaying results: 1 - 10.