Configuring a 2-node GPFS Cluster
NOTE: If configuring a 2-node GPFS cluster, then back-to-back connections can be
used between two Ethernet adapters in each system, and each pair of adapters can be
configured as an Etherchannel. The jumbo frames option should be used, along with
round-robin and also use an alternate hardware address.
If a third quorum node is to be used then the Ethernet connections for each node must
go through a switch.
NOTE: The following mm commands are located in /usr/lpp/mmfs/bin.
1. If using a third node for quorum, then edit /tmp/gpfs/node_list and insert:
<node1_label>:manager-quorum
<node2_label>:manager-quorum
<node3_label>:quorum
2. If not using a third node for quorum, then edit /tmp/gpfs/node_list and insert:
<node1_label>:quorum
<node2_label>:quorum
3. cd /tmp/gpfs
4. Create the cluster:
mmcrcluster –n node_list –p <node1_label> –s <node2_label>
-C<cluster_name>
The cluster name should start with a . (dot), otherwise the name is appended to
the primary hostname in DNS.
-p defines the primary node for managing the cluster configuration
information, and –s the secondary; they do NOT refer to primary and
secondary NSD servers.
5. View the cluster configuration:
mmlscluster
mmlsconfig
6. If you want to use ssh rather than rsh, enter the command:
mmchcluster –r /usr/bin/ssh –R /usr/bin/scp
Note that for this to work the keys must already have been created and
exchanged.
7. Edit /tmp/gpfs/disk_list and define the NSDs:
hdiskx:::::<nsd_name1>
hdisky:::::<nsd_name2>
etc
If vpaths are configured then the vpath numbers must be used instead of the
hdisk numbers.
Use a meaningful naming convention, such as uo1nsd for a /u01 filesystem.
These names will appear in the volume group name column when lspv is entered.
The colon-separated fields are actually:
DiskName:PrimaryServer:BackupServer:DiskUsage:FailureGroup:DesiredName
8. NSD format the disks and create the NSDs with:
mmcrnsd –F disk_list
9. Display NSDs:
mmlsnsd
10. If tiebreaker disks are to be used, then edit /tmp/gpfs/disk_list_tie and define
either 1 or three tiebreaker disks:
hdiskz:::::<tiebreaker_nsd>
Use names like tie1nsd etc. These names will appear in the volume group
name column when lspv is entered.
Note that any disk can be used as a tiebreaker disk and does not have to be
specifically for that purpose.
11. NSD format the disks and create the tiebreaker NSDs with:
mmcrnsd –F disk_list_tie
12. Display all NSDs:
mmlsnsd
13. Identify the tiebreaker disks:
mmchconfig tiebreakerDisks=tie1nsd
Run the above command for each tiebreaker disk, or alternatively:
mmchconfig tiebreakerDisks=”tie1nsd;tie2nsd;tie3nsd”
14. Display the config:
mmlsconfig
15. Change cluster configuration parameters:
mmchconfig autoload=yes
mmchconfig leaseDuration=15
16. If Oracle is to be used, change tuning parameters for 64-bit kernel:
mmchconfig prefetchThreads=75
mmchconfig worker1Threads=475
17. Start GPFS on all cluster nodes:
mmstartup –a
18. Create the GPFS filesystems (i.e. create multiple GPFS filesystems, each
residing on a single NSD):
mmcrfs /<fs_name1> /dev/<gpfs_name1> <nsd_name1> –B 512K –A yes
mmcrfs /<fs_name2> /dev/<gpfs_name2> <nsd_name2> –B 512K –A yes
etc
For example:
mmcrfs /CRS/db1 /dev/CRS_db1gpfs CRS_db1nsd –B 512K –A yes
Alternatively, you can create a single GPFS filesystem residing on all NSDs:
mmcrfs /<fs_name1> /dev/<gpfs_name1> -F disk_list –B 512K –A yes
Possible block sizes:
a. 16K – Optimises use of disk storage at expense of large data transfers.
b. 64K – Best value if there are many files 64K or less. Faster than 16K,
and more efficient use of disk space than 256K.
c. 256K – The default. Best block size for filesystems that contain large
files accessed in large reads and writes.
d. 512K and 1024K – More efficient if the dominant I/O access is for
large files > 1MB. 1024K is especially suited to large sequential
accesses.
If a RAID installation uses a 64K stripe size, then the block size should be a
multiple of the stripe size to give the most efficient access. For example, in a
6+P+S configuration, then the block size should be 6x64 = 384K, i.e. 512K
minimum.
NOTE: A maximum of 256 GPFS filesystems can be created with Version 3.2.
19. On each node, mount the filesystems:
mount –t mmfs
If the command hangs, then reboot the system.
20. Change Oracle tuning parameters for filesystems:
mmchfs <gpfs_name1> –s balancedRandom
mmchfs <gpfs_name2> –s balancedRandom
etc
Use the balancedRandom option for non-sequential access filesystems.
21. If using a third node for quorum, to prevent an attempt to mount the GPFS
filesystems on the node, run the following command on the third node:
mv /var/mmfs/etc/automount.src /var/mmfs/etc/automount.src.org
22. Show status of cluster with:
mmgetstate –a –L
All nodes should be active. If any are shown as arbitrating, then quorum can
be easily lost.
23. If a GPFS filesystem is to contain the Oracle binaries, then the number of
Inodes for the filesystem should be increased (Oracle 9i needs about 64000
Inodes). Run the command:
mmchfs <gpfs_name> -F 70000
to increase the Inodes to 70000, say.
Deleting a GPFS Cluster
1. Unmount the GPFS filesystems on all cluster nodes:
umount –t mmfs
2. Delete each of the filesystems:
mmdelfs <gpfs_name>
The name is not the pathname of the filesystem but the device name created in
/dev.
3. Delete the NSDs:
mmdelnsd <nsd_name>
4. Shutdown GPFS:
mmshutdown –a
5. Remove the tiebreaker disks from the configuration:
mmchconfig tiebreakerDisks=no
6. Delete each tie breaker disk:
mmdelnsd <tiebreaker_name>
7. Delete all nodes in the cluster:
mmdelnode –a
This will delete the cluster definition also.
8. Remove the contents of the /var/mmfs directory and all files that start with
mm from the /var/adm/ras directory.
AIX and Tuning Considerations
1. The ipqmaxlen parameter controls the number of incoming packets that can
exist on the IP interrupt queue Ensure it is set to at least 512 – this value will
persist after reboot:
no –r –o ipqmaxlen=512
2. For Oracle installations, create the logical volumes such that they map one to
one with a volume group and a volume group be one to one with a LUN which
is a single RAID device.
GPFS Snapshots
A snapshot of an entire GPFS filesystem can be created to preserve the contents of the
filesystem at a single point in time. Snapshots are read-only, and can provide an
online backup feature that allows easy recovery from accidental deletion of a file.
They are NOT copies of the entire filesystem.
Snapshots are only made of active filesystems, not of snapshots.
Creating a Snapshot
To create a snapshot:
mmcrsnapshot fs1 snap1
The output is similar to this:
Writing dirty data to disk
Quiescing all file system operations
Writing dirty data to disk again
Creating snapshot.
Resuming operations.
Before issuing the command, the directory structure would appear similar to:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
After the command has been issued, the directory structure would appear similar to:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
You can create a second snapshot with:
mmcrsnapshot fs1 snap2
After the command has been issued, the directory structure is:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
/fs1/.snapshots/snap2/file1
/fs1/.snapshots/snap2/userA/file2
/fs1/.snapshots/snap2/userA/file3
snap1 and snap2 are merely strings of characters. You can use any convention.
List snapshots with:
mmlssnapshot
Restoring From a Snapshot
Assume we have a directory structure:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
If userA is deleted, we have:
/fs1/file1
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
If the directory userB is then created using the inode originally assigned to userA and
we take another snapshot:
mmcrsnapshot fs1 snap2
The new structure is:
After the command is issued, the directory structure would appear similar to:
/fs1/file1
/fs1/userB/file2b
/fs1/userB/file3b
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
/fs1/.snapshots/snap2/file1
/fs1/.snapshots/snap2/userB/file2b
/fs1/.snapshots/snap2/userB/file3b
If the file system is then restored from snap1:
mmrestorefs fs1 snap1
After the command is issued, the directory structure would appear similar to:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
/fs1/.snapshots/snap2/file1
/fs1/.snapshots/snap2/userB/file2b
/fs1/.snapshots/snap2/userB/file3b
Linking to Snapshots
Snapshot root directories appear in a special .snapshots directory which are visible
by, for example, the ls command. If you prefer to link directly to the snapshot rather
than always having to traverse the root directory, you can use the mmsnapdir
command to add a .snapshots subdirectory to all directories in the file system (see
below).
Assuming we already have a single snapshot, the filesystem may look like:
/fs1/userA/file2b
/fs1/userA/file3b
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
To create links to the snapshot from each directory, and instead of the default
.snapshots name use the name .links instead, enter:
mmsnapdir fs1 -a -s .links
The –s options allows the name change to .links. The directory structure now appears
similar to:
/fs1/userA/file2b
/fs1/userA/file3b
/fs1/userA/.links/snap1/file2
/fs1/userA/.links/snap1/file3
/fs1/.links/snap1/userA/file2
/fs1/.links/snap1/userA/file3
The directories added by the mmsnapdir command are invisible to the ls command.
For example, ls -a /fs1/userA does not show .links, but ls /fs1/userA/.links displays
the filename and cd /fs1/userA/.links works.
To delete the links use:
mmsnapdir fs1 -r
The directory structure is now similar to:
/fs1/userA/file2b
/fs1/userA/file3b
/fs1/.links/snap1/userA/file2
/fs1/.links/snap1/userA/file3
Deleting a Snapshot
Delete a snapshot with:
mmdelsnapshot fs1 snap1