Topics: Red Hat / Linux, Storage

RHEL: Recovering a corrupt file system in emergency mode

If you have a RHEL system that boots in emergency mode due to a corrupt file system, here are some steps to perform to resolve the issue.

Once booted in emergency mode, connect to the system on the console. Try running:

# journalctl -xe
To help determine why the system booted in emergency mode. For example, you may discover that the /var file system has an issue.

Once you do, shut down the system, and boot it from the boot ISO or DVD (depending if the system is a virtual or a physical system).

When using VMware, you'll have to edit the settings of the VM, and enable the "Force BIOS setup" setting, and mount/connect the boot ISO image. Then start it up, and open the console. Once in the BIOS, make sure CDROM is high in the boot order.

Once the system is booting RHEL, select "Troubleshooting", and "Rescue a Red Hat Enterprise Linux system". The system will boot up. Select "Continue" when prompted.

Run "df -h" to list the mounted directories. Since /var is the file system having the issue, unmount it:
# unmount /mnt/sysimage/var
Once /var is unmounted, run fsck to fix any issues:
# fsck /mnt/sysimage/var
Once completed, you can reboot the system. If you connected an ISO image earlier, make sure to disconnect the ISO image.

Topics: Red Hat / Linux, Storage

Using tmp.mount

If you've ever looked at the /tmp file system on a RHEL system, you may have noticed that it is, by default, simply a folder in the root directory.

For example:

# df -h /tmp
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root  100G  4.6G   96G   5% /
The risk of having this is, that anyone can fill up the root file system, by writing temporary data to the /tmp folder, which is risky for system stability.

Red Hat Enterprise Linux 7 offers the ability to use /tmp as a mount point for a temporary file storage system (tmpfs), but unfortunately, it is not enabled by default.

When enabled, this temporary storage appears as a mounted file system, but stores its content in volatile memory instead of on a persistent storage device. And when using this, no files in /tmp are stored on the hard drive except when memory is low, in which case swap space is used. This also means that the contents of /tmp are not persisted across a reboot.

To enable this feature, execute the following commands:
# systemctl enable tmp.mount
# systemctl start tmp.mount
RHEL uses a default size of half the memory size for the in-memory /tmp file system. For example on a system with 16 GB of memory, an 8 GB /tmp file system is set up after enabling the tmp.mount feature:
# df -h /tmp
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root  100G   53G   48G  53% /
# systemctl enable tmp.mount
# systemctl start tmp.mount
# df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           7.8G     0  7.8G   0% /tmp
By having this in place, it's no longer possible to fill up the root file system, when writing files and/or data to the /tmp file system. The downside, however, is that this uses memory, and when filling up the memory, may be using the swap space. As such, having a dedicated file system on disk for the /tmp folder is still the better solution.

Topics: Red Hat / Linux, Storage

Setting up a Volume Group and File Systems on RHEL

This procedure describes how to set a new volume group and file systems on a Red Hat Enterprise Linux system.

First, we'll need to make sure that there is storage available on the system that can be allocated to a new volume group. For this purpose, run the lsblk command:

# lsblk | grep disk
In the output, for example, you may see:
# lsblk | grep disk
fd0       2:0      1    4K  0 disk
sda       8:0      0   60G  0 disk
sdb       8:16     0    5T  0 disk
In the example above, the system has two SCSI devices (that start with "sd"), called sda and sdb. Device sda is 60 GB, and device sdb is 5 TB.

Next, run this command:
# lsblk -a
It will provide you with a tree-like output showing all the disks available on the system, and any partitions (listed as "part") and logical volumes (listed as "lvm") configured on those disks. For the sake of this example, we'll assume that on device sdb there are no partitions and or logical volumes configured, and thus is available.

Also, for the sake of this example, we'll assume that we'll want to set up a few file systems for an Oracle environment, called /u01, /u02, /u03, /u04 and /u05, and that we'll want to have these file systems configured within a volume group called "oracle".

List the volume groups already configured on the system:
# vgs
Make sure there isn't already a volume group present that is called oracle.

Now, let's create a new volume group called oracle, using device sdb:
# vgcreate oracle /dev/sdb
  Physical volume "/dev/sdb" successfully created.
  Volume group "oracle" successfully created
We can now use the "vgs" and "pvs" commands to list the volume groups and the physical volumes on the system. Note in the output that you now can see that a volume group called "oracle" is present, and that disk /dev/sdb is configured in volume group "oracle".

Now create the logical volumes. A logical volume is required for us to create the file systems in later on. We'll be creating the following logical volumes:
  • u01lv of 100 GB for the use of the /u01 file system
  • u02lv of 1.5 TB for the use of the /u02 file system
  • u03lv of 1.5 TB for the use of the /u03 file system
  • u04lv of 1.5 TB for the use of the /u04 file system
  • u05lv of 300 GB for the use of the /u05 file system
Run the following commands to create the logical volumes. You may run the "lvs" command before, in between and after each command to see your progress.
# lvcreate -n u01lv -L 100G oracle
# lvcreate -n u02lv -L 1.5T oracle
# lvcreate -n u03lv -L 1.5T oracle
# lvcreate -n u04lv -L 1.5T oracle
# lvcreate -n u05lv -L 300G oracle
# lvs | grep oracle
  u01lv oracle -wi-a----- 100.00g
  u02lv oracle -wi-a-----   1.50t
  u03lv oracle -wi-a-----   1.50t
  u04lv oracle -wi-a-----   1.50t
  u05lv oracle -wi-a----- 300.00g
Now it's time to create the file systems. We'll be using the standard XFS type of file system:
# mkfs.xfs /dev/oracle/u01lv
# mkfs.xfs /dev/oracle/u02lv
# mkfs.xfs /dev/oracle/u03lv
# mkfs.xfs /dev/oracle/u04lv
# mkfs.xfs /dev/oracle/u05lv
And now that the file systems have been created on top of the logical volumes, we can mount the file systems. To ensure that file systems are mounted at the time that the system boots up, it's best to add the new file systems to file /etc/fstab. Add the following lines to that file:
/dev/oracle/u01lv      /u01     xfs      defaults,noatime 0 0
/dev/oracle/u02lv      /u02     xfs      defaults,noatime 0 0
/dev/oracle/u03lv      /u03     xfs      defaults,noatime 0 0
/dev/oracle/u04lv      /u04     xfs      defaults,noatime 0 0
/dev/oracle/u05lv      /u05     xfs      defaults,noatime 0 0
Make sure the folders of the mount points exist by creating them:
# mkdir /u01
# mkdir /u02
# mkdir /u03
# mkdir /u04
# mkdir /u05
Now mount all the file systems at once:
# mount -a
And then verify that the file systems are indeed present:
# df -h | grep u0
/dev/mapper/oracle-u01lv  100G   33M  100G   1% /u01
/dev/mapper/oracle-u02lv  1.5T   33M  1.5T   1% /u01
/dev/mapper/oracle-u03lv  1.5T   33M  1.5T   1% /u01
/dev/mapper/oracle-u04lv  1.5T   33M  1.5T   1% /u01
/dev/mapper/oracle-u05lv  300G   33M  300G   1% /u01
And that's it. The file systems have been created, and these file systems will persist during a system reboot.

Topics: Networking, Red Hat / Linux, Storage

How to install and configure Samba on CentOS 7 for file sharing on Windows

Here's how to set up a secure Samba share from a CentOS 7 (or RHEL 7) system, and share it with a Windows client.

First, install Samba:

# yum install samba samba-client samba-common
Add an exception to the firewall, if the firewall is active:
# firewall-cmd --permanent --zone=public --add-service=samba
# firewall-cmd --reload
Next, you'll need to know the workgroup the Windows system is configured in. By far, the easiest way to do this, is to open a command prompt on the Windows system, and run:
net config workstation
For the sake of this tutorial, we'll assume the workgroup is called WORKGROUP.

Make a copy of the Samba config file:
# cp /etc/samba/smb.conf /etc/samba/smb.conf.orig
Set up a secure file share. In the example below, the share will be located in /media/windows/share on the CentOS 7 system. Be sure to set the permissions in such a way that the user account used for the share (see below) indeed has access to this folder.
# mkdir -p /media/windows/share
# chmod -R 0755 /media/windows/share
# chown -R user:group /media/windows/share
Edit file /etc/samba/smb.conf and add:
        workgroup = WORKGROUP
        netbios name = centos

        comment = Shared Folder
        path = /media/windows/share
        valid users = user
        browsable = yes
        writable = yes
        guest ok = no
        read only = no
Set the SMB passwd for the user (this will be the username and password used to access the share from Windows):
# smbpasswd -a user
New SMB password:
Retype new SMB password:
Make sure everything is okay:
# testparm
Now enable and start Samba:
# systemctl enable smb.service
# systemctl enable nmb.service
# systemctl start smb.service
# systemctl start nmb.service
On the Windows host, io File explore type the IP address of the CentOS system, for example:
You will be asked for the username and password used when you ran the smbpasswd command.

And that should do it; You should now have a secured Samba share available on a Windows system.

Windows may cache any credentials that are used for the Samba share(s). When configuring the Samba share(s), it may be needed to have Windows "forget" these credentials. This can be easily achieved by running from a Command Prompt:
net use * /del

Topics: Red Hat / Linux, Storage

RHEL 7: Set up storage multi-pathing

This next piece describes how to configure the storage multi-pathing software on a Red Hat Enterprise Linux 7 system. This is a required to install if you're using SAN storage and multiple paths are available to the storage (which is usually the case).

First, check if all required software is installed. It generally is, but it's good to check:

# yum -y install device-mapper-multipath
Next, check if the multipath daemon is running:
# service multipathd status
If it is, stop it:
# service multipathd stop
Configure file /etc/multipath.conf, which is the configuration file for the multipath daemon:
# mpathconf --enable --with_multipathd y
This will create a default /etc/multipath.conf file, which will work quite well often, without any further configuration needed.

Then start the multipath daemon:
# service multipathd start
Redirecting to /bin/systemctl start multipathd.service
You can now use the lsblk command to view the disks that are configured on the system.
# lsblk
This command should show that there have been mpathX devices created, which are the multipath devices managed by the multipath daemon, and you can now start using these mpathX disk devices as storage on the Red Hat system. Another way to check the mpath disk devices available on the system, is by looking at the /dev/mapper directory:
# ls -als /dev/mapper/mpath*
If you have a clustered environment, where SAN storage devices are zoned and allocated to multiple systems, you may want to ensure that all the nodes in the cluster are using the same naming for the mpathX devices. That makes it easier to recognize which disk is which on each system.

To ensure that all the nodes in the cluster use the same naming, first run a "cat /etc/multipath/bindings" command on all nodes, and identify which disks are shared on all nodes, and what the current naming of the mpathX devices on each system looks like. It may well be that the naming of the mpathX devices is already consistent on all cluster nodes.

If it is not, however, then copy file /etc/multipath/bindings from one server to all other cluster nodes. Be careful when doing this, especially when one or more servers in a cluster have more SAN storage allocated than others. Be sure that only those entries in /etc/multipath/bindings are copied over to all cluster nodes, where the entries represent shared storage on all cluster nodes. Any SAN storage allocated to just one server will show up in the /etc/multipath/bindings file for that server only, and it should not be copied over to other servers.

Once the file is correct on all cluster nodes. Restart the multipath daemon on each cluster node:
# service multipathd stop
# multipath -F
# service multipathd start
If you now do a "ls" in /dev/mapper on each cluster node, you'll see the same mpath names on all cluster nodes.

Once this is complete, make sure that the multipath daemon is started at system boot time as well:
# systemctl enable multipathd

Topics: PowerHA / HACMP, Storage

Adding a new volume group to an active PowerHA resource group

This article describes how to add a new volume group to an existing resource group of an active PowerHA cluster.

The first step is to add the storage to both of the nodes of the PowerHA cluster. In the case of SAN storage, please ensure that your storage administrator adds the storage to both nodes of the cluster. Then discover the newly added storage by running the cfgmgr command on one of the nodes:

# cfgmgr
Set a PVID on all the new disks that have been discovered. For example, for disk hdisk77, run:
# chdev -l hdisk77 -a pv=yes
Repeat this command for any of the new disks.

Next, log in to the other node, and run the cfgmgr command on that node as well, so the disks will be discovered on the other node as well. And when you run "lspv" on the second node after running cfgmgr, you'll notice that the PVID is already set for all the discovered disks (it was set on the first node).

On both nodes, make sure that the disk attributes are set correctly. Now, this may differ for the type of storage used (so please make sure to check your storage vendor's recommendations on this topic), but a good starting point is (for example, for disk hdisk4):
# chdev -l hdisk4 -a max_transfer=0x100000 -a queue_depth=32 -a reserve_policy=no_reserve -a algorithm=round_robin
For clustered nodes, it's important that the reserve_policy is set to no_reserve.

A note about the max_transfer attribute: This value should be set to the same value as the max_xfer_size attribute of the fiber adapter. By default attribute max_transfer is set to 0x40000, which is usually lower than the max_xfer_size attribute on the fiber adapter, which results in a smaller buffer size memory being used. To check the max_xfer_size attribute on the fiber adapter, run (for example for adapter fcs0):
# lsattr -El fcs0 -a max_xfer_size
Also please make sure to set the disk attributes for all the new disks on both nodes. These attributes are stored in the ODM of the AIX system locally, and don't automatically transfer over to other nodes of the cluster, so you'll have to set these attributes for all new disks on all cluster nodes.

The next step is to create the new volume group(s). The first thing that we'll need to know is a common majar number that is available on both nodes of a cluster. For that purpose, run the following command on all cluster nodes, which lists the available major numbers:
# lvlstmajor
Choose a major number available on all cluster nodes. For the purpose of this article, let's assume major number 57 is available.

For PowerHA, a new volume group should be configured as concurrent-capable (configured by the -C option of the mkvg command). Also, The quorum should be disabled (by using -Qn), and the auto-varyon should be disabled as well (by using the -n option) as PowerHA will varyon the volume group for us. Finally, the major number should be set (in the example below: -V 57).

As such, run the following mkvg command to create the volume group (Note: please adjust the volume group name, the major number and disk names according to your situation and preference):
# mkvg -v 57 -S -n -Qn -y fs01vg hdisk38 hdisk42 hdisk45
Note: run this command on only one of the cluster nodes, and continue working on this cluster node for now for the next steps.

Next, create a logical volume (adjust per your situation and preference):
# mklv -y fs01lv -t jfs2 -u 1 -x 1278 hdisk38
This command will create logical volume fs01lv for the purpose of using it for a JFS2 type file system, with an upper bound of 1 (just use 1 disk), and allocate 1278 partitions on disk hdisk38.

Then create a file system on the previously defined logical volume:
# crfs -v jfs2 -d fs01lv -a logname=INLINE -a options=noatime -m /fs01 -A no
This command will create file system /fs01 on top of logical volume fs01lv, and will use an inline log (recommended for optimal performance), will not record access times (options=noatime) to avoid unneccessary writes to the file system, and will tell AIX not to automatically mount the file system at system start (-A no), as PowerHA will mount the file system instead.

At this point, the volume group, logical volume and file system have been created. You can create additional volume groups, logical volumes and file systems as it pertains to your situation. Once done, ensure that the volume group is varied off, for example for volume group fs01vg:
# varyoffvg fs01vg
Should the varyoff command fail at this point, then please check if the file system(s) is/are still mounted, and un-mount them before running the varyoffvg command.

Now run "lspv" on both nodes of the cluster, and look at the PVIDs listed. Pick one of the disks on which you configured the volume group on the first node, and take that disk's PVID, for example 00fac78651b28b53. On the other node, run the importvg command to import the volume group (which also imports the information about any logical volumes and file systems configured in that volume group), using the PVID of one of the disks, and the major number previously defined, for example:
# importvg -n -V 57 -y fs01vg 00fac78651b28b53
Repeat this for any additional volume groups in your situation. Do make sure to select the correct PVID of one of the disks in the volume group on the first node when importing this volume group on the second node, as you'll want to make sure both nodes of a PowerHA cluster use the same volume group names.

At this time, the storage is properly configured on both nodes, and it is time to add the volume group(s) to the PowerHA resource group.

First, verify that the cluster is in a good state by running:
# smitty hacmp
  problem determination tools
    powerha verification
      verify powerha configuration
Once confirmed, allow PowerHA to discover the new disks in the cluster:
# smitty cm_discover_nw_interfaces_and_disks
Now add the volume group(s) to the resource group of the cluster:
# smitty hacmp
  cluster applications and resources
    resource groups
      change/show resources and attributes for a resource group
Select your resource group, and add the volume group(s) to the "Volume Groups" entry.

Next, sync the cluster, which should bring the volume group(s) online:
# smitty hacmp
  cluster applications and resources
    verify and synchronize cluster configuration
Once this is complete, you should be able to see the new volume group(s) online, by running:
# lsvg -o
And you should be able to see that any new file systems have been mounted by PowerHA:
# df
Not required, but good practice, is to now schedule a failover test for your PowerHA cluster, to ensure everything is still working as it should, in case of a fail-over scenario.

Topics: Networking, Red Hat / Linux, Storage, System Admin

Quick NFS configuration on Red Hat

This is a quick NFS configuration using RHEL without too much concerts about security or any fine tuning and access control. In our scenario, there are two hosts:

  • NFS Server, IP
  • NFS Client, IP
First, start with the NFS server:

On the NFS server, un the below commands to begin the NFS server installation:
[nfs-server] # yum install nfs-utils rpcbind
Next, for this procedure, we export an arbitrary directory called /opt/nfs. Create /opt/nfs directory:
[nfs-server] # mkdir -p /opt/nfs
Edit the /etc/exports file (which is the NFS exports file) to add the below line to export folder /opt/nfs to client
Next, make sure to open port 2049 on your firewall to allow client requests:
[nfs-server] # firewall-cmd --zone=public --add-port=2049/tcp --permanent
[nfs-server] # firewall-cmd --reload
Start the rpcbind and NFS server daemons in this order:
[nfs-server] # service rpcbind start; service nfs start
Check the NFS server status:
[nfs-server] # service nfs status 
Redirecting to /bin/systemctl status nfs.service
nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; 
 vendor preset: disabled)
  Drop-In: /run/systemd/generator/nfs-server.service.d
   Active: active (exited) since Tue 2017-11-14 09:06:21 CST; 1h 14min ago
 Main PID: 2883 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nfs-server.service
Next, export all the file systems configured in /etc/exports:
[nfs-server] # exportfs -rav
And check the currently exported file systems:
[nfs-server] # exportfs -v
Next, continue with the NFS client:

Install the required packages:
[nfs-client] # yum install nfs-utils rpcbind
[nfs-client]# service rpcbind start
Create a mount point directory on the client, for example /mnt/nfs:
[nfs-client] # mkdir -p /mnt/nfs
Discover the NFS exported file systems:
[nfs-client] # showmount -e
Export list for
Mount the previously NFS exported /opt/nfs directory:
[nfs-client] # mount /mnt/nfs
Test the correctness of the setup between the NFS server and the NFS client by creating a file in the NFS mounted directory on the client side:
[nfs-client] # cd /mnt/nfs/
[nfs-client] # touch testfile
[nfs-client] # ls -l
total 0
-rw-r--r--. 1 root root 0 Dec 11 08:13 testfile
Move to the server side and check if the testfile file exists:
[nfs-server] # cd /opt/nfs/
[nfs-server] # ls -l
total 0
-rw-r--r--. 1 root root 0 Dec 11 08:13 testfile
At this point it is working, but it is not set up to remain there permanently (as in: it will be gone when either the NFS server or NFS client is rebooted. To ensure it remains working even after a reboot, perform the following steps:

On the NFS server side, to have the NFS server service enabled at system boot time, run:
[nfs-server] # systemctl enable nfs-server
On the NFS server client side, add an entry to the /etc/fstab file, that will ensure the NFS file system is mounted at boot time:  /mnt/nfs  nfs4  soft,intr,nosuid  0 0
The options for the NFS file systems are as follows:
  • soft = No hard mounting, avoids hanging file access commands on the NFS client, if the NFS servers is unavailable.
  • intr = Allow NFS requests to be interrupted if the NFS server goes down or can't be reached.
  • nosuid = This prevents remote users from gaining higher privileges by running a setuid program.
If you need to know on the NFS server side, which clients are using the NFS file system, you can use the netstat command, and search for both the NFS server IP address and port 2049:
[nfs-server] # netstat -an | grep
This will tell you the established connections for each of the clients, for example:
tcp  0  0  ESTABLISHED
In the example above you can see that IP address on port 757 (NFS client) is connected to port 2049 on IP address (NFS server).

Topics: LVM, Red Hat / Linux, Storage

Logical volume snapshot on Linux

Creating a snapshot of a logical volume, is an easy way to create a point-in-time backup of a file system, while still allowing changes to occur to the file system. Basically, by creating a snapshot, you will get a frozen (snapshot) file system that can be backed up without having to worry about any changes to the file system.

Many applications these days allow for options to "freeze" and "thaw" the application (as in, telling the application to not make any changes to the file system while frozen, and also telling it to continue normal operations when thawed). This functionality of an application can be really useful for creating snapshot backups. One can freeze the application, create a snapshot file system (literally in just seconds), and thaw the application again, allowing the application to continue. Then, the snapshot can be backed up, and once the backup has been completed, the snapshot can be removed.

Let's give this a try.

In the following process, we'll create a file system /original, using a logical volume called originallv, in volume group "extern". We'll keep it relatively small (just 1 Gigabyte - or 1G), as it is just a test:

# lvcreate -L 1G -n originallv extern
  Logical volume "originallv" created.
Next, we'll create a file system of type XFS on it, and we'll mount it.
# mkfs.xfs /dev/mapper/extern-originallv
# mkdir /original
# mount /dev/mapper/extern-originallv /original
# df -h | grep original
/dev/mapper/extern-originallv 1014M   33M  982M   4% /original
At this point, we have a file system /original available, and we can start creating a snapshot of it. For the purpose of testing, first, create a couple of files in the /original file system:
# touch /original/file1 /original/file2 /original/file3
# ls /original
file1  file2  file3
Creating a snapshot of a logical volume is done using the "-s" option of lvcreate:
# lvcreate -s -L 1G -n originalsnapshotlv /dev/mapper/extern-originallv
In the command example above, a size of 1 GB is specified (-L 1G). The snapshot logical volume doesn't have to be the same size as the original logical volume. The snapshot logical volume only needs to hold any changes to the original logical volume while the snapshot logical volume exists. So, if there are very little changes to the original logical volume, the snapshot logical volume can be quite small. It's not uncommon for the snapshot logical volume to be just 10% of the size of the original logical volume. If there are a lot of changes to the original logical volume, while the snapshot logical volume exists, you may need to specify a larger logical volume size. Please note that large databases, in which lots of changes are being made, are generally not good candidates for snapshot-style backups. You'll probably have to test in your environment if it will work for your application, and to determine what a good size will be of the snapshot logical volume.

The name of the snapshot logical volume in the command example above is set to originalsnapshotlv, using the -n option. And "/dev/mapper/extern-originallv" is specified to indicate what the device name is of the original logical volume.

We can now mount the snapshot:
# mkdir /snapshot
# mount -o nouuid /dev/mapper/extern-originalsnapshotlv /snapshot
# df -h | grep snapshot
/dev/mapper/extern-originalsnapshotlv 1014M   33M  982M   4% /snapshot
And at this point, we can see the same files in the /snapshot folder, as in the /original folder:
# ls /snapshot
file1  file2  file3
To prove that the /snapshot file system remains untouched, even when the /original file system is being changed, let's create a file in the /original file system:
# touch /original/file4
# ls /original
file1  file2  file3  file4
# ls /snapshot
file1  file2  file3
As you can see, the /original file system now holds 4 files, while the /snapshot file system only holds the original 3 files. The snapshot file system remains untouched.

To remove the snapshot, a simple umount and lvremove will do:
# umount /snapshot
# lvremove -y /dev/mapper/extern-originalsnapshotlv
So, if you want to run backups of your file systems, while ensuring no changes are being made, here's the logical order of steps that can be scripted:
  • Freeze the application
  • Create the snapshot (lvcreate -s ...)
  • Thaw the application
  • Mount the snapshot (mkdir ... ; mount ...)
  • Run the backup of the snapshot file system
  • Remove the snapshot (umount ... ; lvremove ... ; rmdir ...)

Topics: AIX, Storage, System Admin

Allocating shared storage to VIOS clients

The following is a procedure to add shared storage to a clustered, virtualized environment. This assumes the following: You have a PowerHA cluster on two nodes, nodeA and nodeB. Each node is on a separate physical system, and each node is a client of a VIOS. The storage from the VIOS is mapped as vSCSI to the client. Client nodeA is on viosA, and client nodeB is on viosB. Futhermore, this procedure assumes you're using SDDPCM for multi-pathing on the VIOS.

First of all, have your storage admin allocate and zone shared LUN(s) to the two VIOS. This needs to be one or more LUNs that is zoned to both of the VIOS. This procedure assumes you will be zoning 4 LUNs of 128 GB.

Once that is completed, then move to work on the VIOS:


First, gather some system information as user root on the VIOS, and save this information to a file for safe-keeping.

# lspv
# lsdev -Cc disk
# /usr/ios/cli/ioscli lsdev -virtual
# lsvpcfg
# datapath query adapter
# datapath query device
# lsmap -all
Discover new SAN LUNs (4 * 128 GB) as user padmin on the VIOS. This can be accomplished by running cfgdev, the alternative to cfgmgr on the VIOS. Once that has run, identify the 4 new hdisk devices on the system, and run the "bootinfo -s" command to determine the size of each of the 4 new disks:
# cfgdev
# lspv
# datapath query device
# bootinfo -s hdiskX
Change PVID for the disks (repeat for all the LUNs):
# chdev -l hdiskX -a pv=yes
Next, map the new LUN from viosA to the nodeA lpar. You'll need to know 2 things here: [a] What vhost adapter (or "vadapter) to use, and [b] what name to give the new device (or "virtual target device"). Have a look at the output of the "lsmap -all" command that you ran previously. That will provide you information on the current naming scheme for the virtual target devices. Also, it will show you what vhost adapters already exist, and are in use for the client. In this case, we'll assume the vhost adapter is vhost0, and there are already some virtual target devices, called: nodeA_vtd0001 through nodeA_vtd0019. The new four LUNs therefore will be named: nodeA_vtd0020 through nodeA_vtd0023. We'll also assume the new disks are numbered hdisk44 through hdisk47.
# mkvdev -vdev hdisk44 -vadapter vhost0 -dev nodeA_vtd0020
# mkvdev -vdev hdisk45 -vadapter vhost0 -dev nodeA_vtd0021
# mkvdev -vdev hdisk46 -vadapter vhost0 -dev nodeA_vtd0022
# mkvdev -vdev hdisk47 -vadapter vhost0 -dev nodeA_vtd0023
Now the mapping of the LUNs is complete on viosA. You'll have to repeat the same process on viosB:


First, gather some system information as user root on the VIOS, and save this information to a file for safe-keeping.
# lspv
# lsdev -Cc disk
# /usr/ios/cli/ioscli lsdev -virtual
# lsvpcfg
# datapath query adapter
# datapath query device
# lsmap -all
Discover new SAN LUNs (4 * 128 GB) as user padmin on the VIOS. This can be accomplished by running cfgdev, the alternative to cfgmgr on the VIOS. Once that has run, identify the 4 new hdisk devices on the system, and run the "bootinfo -s" command to determine the size of each of the 4 new disks:
# cfgdev
# lspv
# datapath query device
# bootinfo -s hdiskX
No need to set the PVID this time. It was already configured on viosA, and after running the cfgdev command, the PVID should be visible on viosB, and it should match the PIVIDs on viosA. Make sure this is correct:
# lspv
Map the new LUN from viosB to the nodeB lpar. Again, you'll need to know the vadapter and the virtual target device names to use, and you can derive that information by looking at the output of the "lsmap -all" command. If you've done your work correctly in the past, the naming of the vadapter and the virtual target devices will probably be the same on viosB as on viosA:
# mkvdev -vdev hdisk44 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk45 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk46 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk47 -vadapter vhost0 -dev nodeB_vtd0020
Now that the mapping on both the VIOS has been completed, it is time to move to the client side. First, gather some information about the PowerHA cluster on the clients, by running as root on the nodeA client:
# clstat -o
# clRGinfo
# lsvg |lsvg -pi
Run cfgmgr on nodeA to discover the mapped LUNs, and then on nodeB:
# cfgmgr
# lspv
Ensure that the disk attributes are correctly set on both servers. Repeat the following command for all 4 new disks:
# chdev -l hdiskX -a algorithm=fail_over -a hcheck_interval=60 -a queue_depth=20 -a reserve_policy=no_reserve
Now you can add the 4 new added physical volumes to a shared volume group. In our example, the shared volume group is called sharedvg, and the newly discovered disks are called hdisk55 through hdisk58. Finally, the concurrent resource group is called concurrent_rg.
# /usr/es/sbin/cluster/sbin/cl_extendvg -cspoc -g'concurrent_rg' -R'nodeA' sharedvg hdisk55 hdisk56 hdisk57 hdisk58
Next, you can move forward to creating logical volumes (and file systems if necessary), for example, when creating raw logical volumes for an Oracle database:
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw5 sharedvg 1023 hdisk55
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw6 sharedvg 1023 hdisk56
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw7 sharedvg 1023 hdisk57
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw8 sharedvg 1023 hdisk58
Finally, verify the volume group:
# lsvg -p sharedvg
# lsvg sharedvg
# ls -l /dev/asm_raw*
If necessary, these are the steps to complete, if the addition of LUNs has to be backed out:
  1. Remove the raw logical volumes (using the cl_rmlv command)
  2. Remove the added LUNs from the volume group (using the cl_reducevg command)
  3. Remove the disk devices on both client nodes: rmdev -dl hdiskX
  4. Remove LUN mappings from each VIOS (using the rmvdev command)
  5. Remove the LUNs frome each VIOS (using the rmdev command)

Topics: AIX, Storage, System Admin

Identifying a Disk Bottleneck Using filemon

This blog will display the steps required to identify an IO problem in the storage area network and/or disk arrays on AIX.

Note: Do not execute filemon with AIX 6.1 Technology Level 6 Service Pack 1 if WebSphere MQ is running. WebSphere MQ will abnormally terminate with this AIX release.

Running filemon: As a rule of thumb, a write to a cached fiber attached disk array should average less than 2.5 ms and a read from a cached fiber attached disk array should average less than 15 ms. To confirm the responsiveness of the storage area network and disk array, filemon can be utilized. The following example will collect statistics for a 90 second interval.

# filemon -PT 268435184 -O pv,detailed -o /tmp/filemon.rpt;sleep 90;trcstop

Run trcstop command to signal end of trace.
Tue Sep 15 13:42:12 2015
System: AIX 6.1 Node: hostname Machine: 0000868CF300
[filemon: Reporting started]
# [filemon: Reporting completed]

[filemon: 90.027 secs in measured interval]
Then, review the generated report (/tmp/filemon.rpt).
# more /tmp/filemon.rpt
Detailed Physical Volume Stats   (512 byte blocks)

VOLUME: /dev/hdisk11  description: XP MPIO Disk P9500   (Fibre)
reads:                  437296  (0 errs)
  read sizes (blks):    avg     8.0 min       8 max       8 sdev     0.0
  read times (msec):    avg   11.111 min   0.122 max  75.429 sdev   0.347
  read sequences:       1
  read seq. lengths:    avg 3498368.0 min 3498368 max 3498368 sdev     0.0
seeks:                  1       (0.0%)
  seek dist (blks):     init 3067240
  seek dist (%tot blks):init 4.87525
time to next req(msec): avg   0.206 min   0.018 max 461.074 sdev   1.736
throughput:             19429.5 KB/sec
utilization:            0.77

VOLUME: /dev/hdisk12  description: XP MPIO Disk P9500   (Fibre)
writes:                 434036  (0 errs)
  write sizes (blks):   avg     8.1 min       8 max      56 sdev     1.4
  write times (msec):   avg   2.222 min   0.159 max  79.639 sdev   0.915
  write sequences:      1
  write seq. lengths:   avg 3498344.0 min 3498344 max 3498344 sdev     0.0
seeks:                  1       (0.0%)
  seek dist (blks):     init 3067216
  seek dist (%tot blks):init 4.87521
time to next req(msec): avg   0.206 min   0.005 max 536.330 sdev   1.875
throughput:             19429.3 KB/sec
utilization:            0.72
In the above report, hdisk11 was the busiest disk on the system during the 90 second sample. The reads from hdisk11 averaged 11.111 ms. Since this is less than 15 ms, the storage area network and disk array were performing within scope for reads.

Also, hdisk12 was the second busiest disk on the system during the 90 second sample. The writes to hdisk12 averaged 2.222 ms. Since this is less than 2.5 ms, the storage area network and disk array were performing within scope for writes.

Other methods to measure similar information:

You can use the topas command using the -D option to get an overview of the most busiest disks on the system:
# topas -D
In the output, columns ART and AWT provide similar information. ART stands for the average time to receive a response from the hosting server for the read request sent. And AWT stands for the average time to receive a response from the hosting server for the write request sent.

You can also use the iostat command, using the -D (for drive utilization) and -l (for long listing mode) options:
# iostat -Dl 60
This will provide an overview over a 60 second period of your disks. The "avg serv" column under the read and write sections will provide you average service times for reads and writes for each disk.

An occasional peak value recorded on a system, doesn't immediately mean there is a disk bottleneck on the system. It requires longer periods of monitoring to determine if a certain disk is indeed a bottleneck for your system.

Number of results found for topic Storage: 51.
Displaying results: 1 - 10.