- http://www.ibm.com/developerworks/wikis/pages/viewpage.action?pageId=119080484


Something isn't working, where do I start?
Added by ScottGPFS, last edited by hraval on Nov 28, 2012  (view change)
Labels: 
(None)

This information has moved

This information has moved 

Contents

  • Common
    • A node failed and I had to rebuild it from scratch. How do I add it back into the cluster?
    • I just created a new file system with pools and when I try to write a file I receive a no space error?
    • I successfully created the NSD's but now GPFS does not see them
    • Something seems slow or appears to hang
  • AIX
  • Linux
    • GPFS fails to start and reports "no such file or directory" for libssl library in the mmfs.log file
    • The Kernel module will not build.
  • Windows
    • How can I verify the connection to the Active Directory server is working from a windows node?
    • The GPFS installation fails (ERROR: encountered while installing driver package 'C:\Windows\SUA\usr\lpp\mmfs\driver\mmbus\mmbus.inf')
    • When attempting to add a windows node mmaddnode fails (A remote host refused an attempted connect operation.)

 Common

A server failed and I had to rebuild it from scratch. How do I add it back into the cluster?

How to recover a failed GPFS node.

It was not an NSD server:
If the node is not an NSD Server then the easiest way to recover is to remove the node from the cluster and add it back in.

  1. Remove the node from the cluster (The node cannot be "ping"able for this to work)
    mmdelnode -N failednode
  2. And Add it back in using mmaddnode
    mmaddnode -N failednode

It was an NSD Server:
If the node is an NSD server you cannot remove it from the cluster without reconfiguring the NSD server definitions for the disks. To recover the node without reconfiguring the NSD server definitions.

  1. Reinstall the operating system and GPFS
  2. Get a copy of the mmsdrfs file from the primary cluster configuration server. (You can get it from any node but this one is the most up to date).
    scp PrimaryClusterConfigNode:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/mmsdrfs
  3. Make sure the cluster configuration information is up to date
    mmchcluster -p LATEST
  4. At this point you should be able to start the node
    mmstartup -N failednode

 I just created a new file system with pools and when I try to write a file I receive a no space error?

If you just created a new file system and you cannot create a file it may be that you have storage pools and no policies. If your system pool is metadata only, which is fine, that means you have metadata space and no data space in that storage pool. The default rule places everything in the system storage pool.  You can check the policy configuration by running mmlspolicy.

[root@perf7-c4-int64]#  mmlspolicy gpfs1
No policy file was installed  for file system 'gpfs1'.

If it says "No policy file was installed" you need a policy. To install a policy you can create a simple policy, something like this:

RULE 'default' set POOL 'satapool'

This policy will send all file data to the storage pool named satapool. Place that text in a file (policy.txt) then install the policy

mmchpolicy gpfs1  policy.txt

I successfully created the NSD's but now GPFS does not see them  

Sometimes you can create an NSD using the mmcrnsd command and it completes successfully thenmmlsnsd -X, for example, says the devices are not found.

# mmlsnsd -X

Disk name NSD volume ID Device Devtype Node name Remarks
---------------------------------------------------------------------------------------------------
nsd1 1E05D0374B7053C4 - - node1 (not found) server node
nsd2 1E05D0384B7053C4 - - node2 (not found) server node

Cause: Unknown device name

This can be caused by GPFS not scanning the device name by default. For example, in this case the device name was /dev/fioa and GPFS does not look for devices that start with /dev/fio* by default. GPFS looks for /dev/sd*, for example. When you run the mmcrnsd command it reads the device name from the NSD descriptor you provided but when the GPFS daemon attempts to find that device it looks through the list of devices it discovered at startup or after you ran the mmnsddiscover command (If the devices were added since the GPFS daemon was started). In this case you need to tell GPFS about this new device name. You can do this using the nsddevices user exit. For information and an example on how to use the nsddevices user exit see Device Naming

Something seems slow or appears to hang


If file system access seems slow or GPFS seems to be hanging. The place to start investigating this is to look at what GPFS calls "waiters." Waiters are operations that are talking longer than some threshold, the reporting threshold is different for each type of operation. Some waiters are normal and indicate a healthy system, some can provide you information on where a problem lies. To see the waiters:
When running GPFS 3.4 you can use the mmdiag command

mmdiag --waiters

When running GPFS 3,.3 or earlier

mmfsadm dump waiters

For more information see:


Linux 

GPFS fails to start and reports "no such file or directory" for libssl library in the mmfs.log file


 
This message may occur if the right library is not specified by the opensslLibName config parameter which defaults to a list of common libssl library names: libssl.so:libssl.so.0:libssl.so.4. If the installed libssl library is not in the default list, you need to specify it through the opensslLibName configuration parameter.

mmchconfig opensslLibName="libssl.so.0.9.8e"

An alternative is to create a symbolic link that points a library name in the default list to the installed library.

ln -s  libssl.so.0.9.8e libssl.so

Another alternative is to install the openssl-dev rpm, which should create a symlink "libssl.so" as well.

On SLES11 or later:
zypper install libopenssl-devel
On RHEL5.4 or later:
yum install openssl-devel
 

The Kernel module will not build


If make Autoconfig or make World fails for some reason, and you are running a Linux distribution from Redhat on GPFS 3.4.0.4 and later you can try telling Autoconfig that the Linux version should be redhat using the LINUX_DISTRIBUTION flag. This will allow you to build the GPFS portability layer on CentOS, for example. 

make LINUX_DISTRIBUTION=REDHAT_AS_LINUX Autoconfig
  

Windows

How can I verify the connection to the Active Directory server is working from a windows node?

To verify that the active directory connection is working from a Windows node, you can verify a user account using the mmfsadm command. For example, to verify that the root account is accessible:

mmfsadm test adlookup "cn=root"

The GPFS installation fails (ERROR: encountered while installing driver package 'C:\Windows\SUA\usr\lpp\mmfs\driver\mmbus\mmbus.inf')


Symptom:

The symptom is that the GPFS 3.3 installer on Windows 2008 server will fail and report that the install came to a premature end. When you look in the install logs in %SystemRoot%\SUA\var\adm\ras you see an error similar to this following:

DIFXAPP: ERROR: encountered while installing driver package 'C:\Windows\SUA\usr\lpp\mmfs\driver\mmbus\mmbus.inf'
DIFXAPP: ERROR: InstallDriverPackages failed with error 0x5

Resolution:

This problem can occur when the user's directory (e.g. C:\Users\root) does not permit the SYSTEM user to create temporary files during installation. If you are installing GPFS as root and root has been configured to support passwordless-ssh, then root's home directory will probably not allow SYSTEM write access.

Some known ways to fix this include:

  1. Install GPFS as Administrator
  2. Temporarily give SYSTEM write access to the user's home directory:
    $ cd ~
    $ ls -l -d .
    drwxr-x---  1 root  +SYSTEM  8192 Jan 26 15:04 .
    $ chmod g+w .
    $ ls -l -d .
    drwxrwx---  1 root  +SYSTEM  8192 Jan 26 15:04 .
    $
    $ #  INSTALL GPFS
    $
    $ chmod g-w .
    
  3. Delete the profile of the user account attempting to do the install. 

When attempting to add a windows node mmaddnode fails (A remote host refused an attempted connect operation.)

Symptom:

When you attempt to add a new windows node to an existing AIX or linux cluster you receive the following error:

nodea.ibm.com: A remote host refused an attempted connect  operation.
nodea.ibm.com: A remote host refused an attempted  connect operation.

Resolution:

There are a few things that can cause this:
  1. SSH is not configured properly for passwordless access. To test this try doing ssh from every node to every node uisng

  • The short name (nodea)
  • The fully qualified name (nodea.ibm.com)
  • The IP address (10.1.1.0)

  2. The existing cluster is using rsh instead of ssh. Windows does not support rsh, you mush use ssh. To fix this reconfigure the existing cluster for ssh using the mmchcluster command.

mmchcluster \-r /bin/ssh \-R /bin/ssh



'Cluster FileSystem' 카테고리의 다른 글

GPFS Tuning Parameters on developerworks #1  (0) 2012.12.04
GPFS FAQ on developerworks #1 (Mar 27, 2009)  (0) 2012.12.04
GPFS Hands-on Guide  (0) 2012.11.30
블로그 이미지

Melting

,

- http://www.ibm.com/developerworks/wikis/display/hpccentral/GPFS+Tuning+Parameters


GPFS Tuning Parameters
Added by ScottGPFS, last edited by hraval on Nov 28, 2012  (view change)
Labels: 
(None)

This information has moved

This information has moved here

This section describes some of the configuration parameters available in GPFS. Included are some notes on how they may affect performance. 
These are GPFS configuration parameters that can be set cluster wide, on a specific node or sets of nodes.
To view the configuration parameters that has been changed from the default

mmlsconfig

To view the active value of any of these parameters you can run

mmfsadm dump config

To change any of these parameters use mmchconfig. For example to change the pagepool setting on all nodes.

mmchconfig pagepool=256M

Some options take effect immediately using the -i or -I flag to mmchconfig, some take effect after the node is restarted. Use -i to make the change permanent and affect the running GPFS daemon immediately. Use -I to affect the GPFS daemon only (reverts to saved settings on restart). Refer to the current GPFS Documentation for details.

In addition some parameters have a section called Tuning Guidelines. These are general guidelines that can be used to determine a starting point for tuning a parameter. 

Contents

 
 

leaseRecoveryWait



The leaseRecoveryWait parameter defines how long the FS manager of a filesystem will wait after the last known lease expiration of any failed nodes before running recovery. A failed node cannot reconnect to the cluster before recovery is finished. The leaseRecoveryWait parameter value is in seconds and the default is 35.

Making this value smaller increases the risk that there may be IO in flight from the failing node to the disk/controller when recovery starts running. This may result in out of order IOs between the FS manager and the dying node.

In most cases where a node is expelled from the cluster there is a either a problem with the network or the node running out of resources like paging. For example, if there is an application running on a node paging the machine to death or overrunning network capacity, GPFS may not have a chance to contact the Cluster Manager node to renew its lease within the timeout period.

GPFSCmdPortRange



When GPFS administration commands are executed they may use one or more TCP/IP ports to complete the command. For example when using standard ssh an admin command opens a connection using port 22. In addition to the remote shell or file copy command ports there are additional ports that are opened to pass data to and from GPFS daemons. By default GPFS uses one of the ephemeral ports to complete these connections.

In some environments you may want to limit the range of ports used by GPFS administration commands. You can control the ports used by the remote shell and file copy commands by using different tools or configuring these tools to use different ports. The ports used by the GPFS daemon for administrative command execiution can be defined using the GPFS configuration parameter GPFSCmdPortRange.

mmchconfig GPFSCmdPortRange=lowport-highport

This allows you to limit the ports used for GPFS administration mm* command execution. You need enough ports to support all of the concurrent commands from a node so you should define 20 or more ports for this purpose. Example:

mmchconfig GPFSCmdPortRange=30000-30100

minMissedPingTimeout



The minMissedPingTimeout and maxMissedPingTimeout parameters set limits on the calculation of missedPingTimeout (MPT) which is the allowable time for pings to fail from the Cluster Manager (CM) to a node that has not renewed its lease. The default MPT is leaseRecoveryWait-5 seconds. The CM will wait MPT seconds after the lease has expired before declaring a node out of the cluster. The minMissedPingTimeout and maxMissedPingTimeout parameters value is in seconds and the defaults are 3 and 60 respectively. If these values are changed, only GPFS on the quorum nodes (from which the CM is elected) need to be recycled to take effect.

This can be used to cover over something like a central network switch failure timeout (or other network glitches) that may be longer than leaseRecoveryWait. It may prevent false node down conditions but will extend the time for node recovery to finish which may block other nodes making progress if the failing node held tokens for many shared files.

Just as in the case of leaseRecoveryWait, in most cases where a node is expelled from the cluster there is a either a problem with the network or the node running out of resources like paging. For example, if there is an application running on a node paging the machine to death or overrunning network capacity, GPFS may not have a chance to contact the Cluster Manager node to renew its lease within the timeout period.

maxMissedPingTimeout



See minMissedPingTimeout.

maxReceiverThreads



The maxReceiverThreads parameter is the number of threads used to handle incoming TCP packets. These threads gather the packets until there are enough bytes for the incoming RPC (or RPC reply) to be handled. For some simple RPCs, the receiver thread handles he message immediately, otherwise it hands it off some handler threads.

maxReceiverThreads defaults to the number of CPUs in the node up to 16. It can be configured higher if necessary up to 128 for very large clusters.

pagepool



The Pagepool parameter determines the size of the GPFS file data block cache. Unlike local file systems that use the operating system page cache to cache file data, GPFS allocates its own cache called the pagepool. The GPFS pagepool is used to cache user file data and file system metadata. The default pagepool size of 64MB is too small for many applications so this is a good place to start looking for performance improvement. In release 3.5, the default is 1GB for new installs. When upgrading it keeps the old setting.

Along with file data the pagepool supplies memory for various types of buffers like prefetch and write behind.

For Sequential IO

The default pagepool size may be sufficient for sequential IO workloads, however, a recommended value of 256MB is known to work well in many cases. To change the pagepool size, use the mmchconfig command. For example, to change the pagepool size to 256MB on all nodes in the cluster, execute the mmchconfig command:

    mmchconfig pagepool=256M [-i]

If the file system blocksize is larger than the default (256K), the pagepool size should be scaled accordingly. For example, if 1M blocksize is used, the default 64M pagepool should be increased by 4 times to 256M. This allows the same number of buffers to be cached.

Random IO

The default pagepool size will likely not be sufficient for Random IO or workloads involving a large number of small files. In some cases allocating 4GB, 8GB or more memory can improve workload performance.

Random Direct IO

For database applications that use Direct IO, the pagepool is not used for any user data. It's main purpose in this case is for system metadata and caching the indirect blocks of the database files.

NSD servers

Assuming no applications or Filesystem Manager services are running on the NSD servers, the pagepool is only used transiently by the NSD worker threads to gather data from client nodes and write the data to disk. The NSD server does not cache any of the data. Each NSD worker just needs one pagepool buffer per operation, and the buffer can be potentially as large as the largest filesystem blocksize that the disks belong to. With the default NSD configuration, there will be 3 NSD worker threads per LUN (nsdThreadsPerDisk) that the node services. So the amount of memory needed in the pagepool will be 3*#LUNS*maxBlockSize. The target amount of space in the pagepool for NSD workers is controlled by nsdBufSpace which defaults to 30%. So the pagepool should be large enough so that 30% of it has enough buffers.

32 Bit operating systems

On 32-bit operating systems pagepool is limited by the GPFS daemons address space. This means that it cannot exceed 4GB in size and is often much smaller due to other limitations.



opensslLibName



To initialize multi-cluster communiations GPFS uses openssl. When initializng openssl GPFS looks for these ssl libraries: libssl.so:libssl.so.0:libssl.so.4 (as of GPFS 3.4.0.4). If you are using a newer version of openssl the filename may not match one in the list (exmaple libssl.so.6). You can use the opensslLibName parameter to tell GPFS to look for the newer version instead.

mmchconfig opensslLibName="libssl.so.6"



readReplicaPolicy



Options: default, local

Default
By default when data is replicated GPFS spreads the reads over all of the available failure groups. This configuration typically best when the nodes running GPFS have equal access to both copies of the data.

Local
A value of local has two effects on reading data in a replicated storage pool. Data is read from:

  1. A local block device
  2. A "local" NSD Server

The local block device means that the path to the disk is through a block special device on Linux, for example that would be a /dev/sd* or on AIX a /dev/hdisk device. GPFS does not do any further determination, so if disks at two sites are connected with a long distance fiber connection GPFS cannot distinguish what is local. So to use this option connect the sites using the NSD protocol over TCP/IP or InfiniBand Verbs (Linux Only).

Further GPFS uses the subnets configuration setting to determine what NSD servers are "local" to an NSD client. For NSD clients to benefit from "local" read access the NSD servers supporting the local disk need to be on the same subnet as the NSD clients accessing the data and that subnet needs to be defined using the "subnets" configuration parameter. This parameter is useful when GPFS replication is used to mirror data across sites and there are NSD clients in the cluster. This keeps read access requests from being sent over the WAN.



seqDiscardThreshold



The seqDiscardThreshold parameter affects what happens when GPFS detects a sequential read (or write) access pattern and has to decide what to do with the pagepool buffer after it is consumed (or flushed by writebehind threads). This is the highest performing option for the case where a very large file is read (or written) sequentially. The default for this value is 1MB which means that if you have a file that is sequentially read and is greater than 1MB GPFS does not keep the data in cache after consumption. There are some instances where large files are reread often by multiple processes; data analytics for example. In some cases you can improve the performance of these applications by increasing seqDiscardThreshold to be larger than the sets of files you would like to cache. Increasing seqDiscardthreshold tells GPFS to attempt to keep as much data in cache as possible for the files below that threshold. The value of seqDiscardThreshold is file size in bytes. The default is 1MB (1048576 bytes). 

Tuning Guidelines

  • Increase this value if you want to cache files, that are sequentially read or written, that are larger than 1MB in size.
  • Make sure there are enough buffer descriptors to cache the file data. (See maxBufferDescs )

sharedMemLimit



The sharedMemLimit parameter allows you to increase the amount of memory available to store various GPFS structures including inode cache and tokens. When the value of sharedMemLimit is set to 0 GPFS automatically determines a value for sharedMemLimit. The default value varies on each platform. In GPFS 3.4 the default on Linux and Windows is 256MB. In GPFS 3.4 on Windows sharedMemLimit can only be used to decrease the size of the shared segment. To determine whether or not increasing sharedMemLimit may help you can use the mmfsadm dump fs command.  For example, if you run mmfsadm dump fs and see that you are not getting the desired levels of maxFilesToCache (aka fileCacheLimit) or maxStatCache (aka statCacheLimit) you can try increasing sharedMemLimit.

# mmfsadm dump fs | head -8

Filesystem dump:
  UMALLOC limits:
    bufferDescLimit       4096 desired     4096
    fileCacheLimit        5000 desired    75000
    statCacheLimit       20000 desired    80000
    diskAddrBuffLimit     4000 desired     4000

The sharedMemLimit parameter is set in bytes.

As of release 3.4 the largest sharedMemLimit on Windows is 256M. On Linux and AIX the largest setting is 256G on 64 bit architectures and 2047M on 32 bit architectures. Using larger values may not work on some platforms/GPFS code versions. The actual sharedMemLimit on Linux may be reduced to a percentage of the kernel vmalloc space limit.

socketMaxListenConnections



The parameter socketMaxListenConnections sets the number of TCP/IP sockets that the daemon can listen on in parallel. This tunable was introduced in 3.4.0.7 specifically for large clusters, where an incast message to a manager node from a large number of client nodes may require multiple listen() calls and timeout. To be effective, the Linux tunable /proc/sys/net/core/somaxconn must also be modified from the default of 128. The effective value is the smaller of the GPFS tunable and the kernel tunable.

Default
Versions prior to 3.4.0.7 are fixed at 128. The default remains 128. The Linux kernel tunable also defaults to 128.

Tuning Guidelines
For clusters under 1000 nodes tuning this value should not be required. For larger clusters it should be set to approximately the number of nodes in the GPFS cluster. 
Example
mmchconfig socketMaxListenConnections=1500
echo 1500 > /proc/sys/net/core/somaxconn
(or)
sysctl -w net.core.somaxconn=1500

socketRcvBufferSize



The parameter socketRcvBufferSize sets the size of the TCP/IP receive buffer used for NSD data communication. This parameter is in bytes.

socketSndBufferSize



The parameter socketSndBufferSize sets the size of the TCP/IP send buffer used for NSD data communication. This parameter is in bytes.

maxMBpS



The maxMBpS option is an indicator of the maximum throughput in megabytes that can be submitted by GPFS per second into or out of a single node. It is not a hard limit rather the maxMBpS value is a hint to GPFS used to calculate how much I/O can effectively be done for sequential prefetch and write-behind operations. In GPFS 3.3, the default maxMBpS value is 150, and in GPFS 3.5 it defaults to 2048. The maximum value is 100,000.

The maxMBpS value should be adjusted for the nodes to match the IO throughput the node is expected to support. For example, you should adjust maxMBpS for nodes that are directly attached to storage. A good rule of thumb is to set maxMBpS to twice the IO throughput required of a system. For example, if a system has two 4Gbit HBA's (400MB/sec per HBA) maxMBpS should be set to 1600. If the maxMBpS value is set too low sequential IO performance may be reduced.

This setting is not used by NSD servers. It is only used for application nodes doing sequential access to files. 

maxFilesToCache



The maxFilesToCache parameter controls how many files each node can cache. Each file cached requires memory for the inode and a token(lock).

In addition to this parameter, maxStatCache config parameter controls how many files are partially cached; the default value of maxStatCache is 4 * maxFilesToCache, so maxFilesToCache controls five times the number of tokens, times the number of nodes in the cluster.  The token managers for a given file system have to keep token state for all nodes in the cluster and from nodes in remote clusters that mount the filesystems.  This should be considered when setting this value.

One thing to keep in mind is that on a large cluster, a change in the value of maxFilesToCache is greatly magnified. Increasing maxFilesToCache from the default of 1000 by a factor of 2 in a cluster with 200 nodes will increase the number of tokens a server needs to store by approximately 2,000,000.  Therefore on large clusters it is recommended that if there is a subset of nodes with the need to have many open files only those nodes should increase the maxFilesToCache parameter. Nodes that may need an increased value for maxFilesToCache would include: login nodes, NFS/CIFS exporters, email servers or other file servers. For systems where applications use a large number of files, of any size, increasing the value for maxFilesToCache may prove beneficial. This is particularly true for systems where a large number of small files are accessed.

The increased value should be large enough to handle the number of concurrently open files plus allow caching of recently used files. You can use mmpmon (See monitoring ) to measure the number of files opened and closed on a GPFS file system. Changing the value of maxFilesToCache effects the amount of memory used on the node. The amount of memory required for inodes and control data structures can be calculated as: maxFilesToCache × 2.5 KB where 2.5 KB = 2 KB + 512 bytes for an inode Valid values of maxFilesToCache range from 1 to 100,000,000.

The size of the GPFS shared segment can limit the maximum setting of maxFilesToCache.  See sharedMemLimit for details.

Note: prior to release 3.5 the default maxFilesToCache and maxStatCache were 1000 and 4000. As of release 3.5, the default values are 4000 and 1000. If you change the maxFilesToCache value but not the maxStatCache value, then maxStatCache will default to 4 * maxFilesToCache.

Tuning Guidelines:

  • The increased value should be large enough to handle the number of concurrently open files plus allow caching of recently used files.
  • Increasing maxFilesToCache can improve the performance of user interactive operations like running ls.  
  • As a rule the total of ((maxFilesToCache + maxStatCache) * nodes) should not exceed (600,000 * (tokenMemLimit/256M) * (The number of manager nodes - 1)).  This is assuming you account for the fact that different nodes may have different values of maxFilesToCache.

maxStatCache



The maxStatCache parameter sets aside additional pageable memory to cache attributes of files that are not currently in the regular file cache. This is useful to improve the performance of both the system and GPFS stat() calls for applications with a working set that does not fit in the regular file cache. The memory occupied by the stat cache can be calculated as: maxStatCache × 176 bytes
Valid values of maxStatCache range from 0 to 10,000,000.

For systems where applications test the existence of files, or the properties of files, without actually opening them (as backup applications do), increasing the value for maxStatCache may prove beneficial. The default value is: 4 × maxFilesToCache
On system where maxFilesToCache is greatly increased it is recommended that this value be manually set to something less than 4 * maxFilesToCache. For example if you set maxFilesToCache to 30,000 you may want to set maxStatCache to 30,000 as well. On compute nodes, this can usually be set much lower since they only have a few active files in use for any one job anyway. 

Note: prior to release 3.5 the default maxFilesToCache and maxStatCache were 1000 and 4000. As of release 3.5, the default values are 4000 and 1000. If you change the maxFilesToCache value but not the maxStatCache value, then maxStatCache will default to 4 * maxFilesToCache.

The size of the GPFS shared segment can limit the maximum setting of maxStatCache.  See sharedMemLimit for details. 

maxBufferDescs



The value of maxBufferDescs defaults 10 * maxFilesToCache up to pagepool size/16K. When caching small files, it actually does not need to be more than a small multiple of maxFilesToCache since only OpenFile objects (not stat cache objects) can cache data blocks.

If an application needs to cache very large files you can tune maxBufferDescs to ensure there are enough to cache large files.  To see the current value use the mmfsadm command:

#mmfsadm dump fs

Filesystem dump:
  UMALLOC limits:
    bufferDescLimit      10000 desired    10000
    fileCacheLimit        1000 desired     1000
    statCacheLimit        4000 desired     4000
    diskAddrBuffLimit      800 desired      800

In this case there are 10,000 buffer descriptors configured. If you have a 1MiB file system blocksize and want to cache a 20GiB file, you will not have enough buffer descriptors. In this case to cache a 20GiB file increase maxBufferDescs to at least 20,480 (20GiB/1MiB=20,480). It is not exactly a one to one mapping so a value of 32k may be appropriate.

mmchconfig maxBufferDescs=32k

nfsPrefetchStrategy



The parameter nfsPrefetchStrategy tells GPFS to optimize prefetching for NFS file style access patterns. It defines a window of the number of blocks around the current position that are treated as "fuzzy sequential" access. This can improve performance when reading big files sequentially, but because of kernel scheduling, some of the read requests come to GPFS out of order and therefore do not look "strictly sequential". If the filesystem blocksize is small relative to the read request sizes, making this bigger will provide a bigger window of blocks. The default is 0  .

Tuning Guidelines

  • Setting nfsPrefetchStrategy to 1 can improve sequential read performance when large files are accessed using NFS. 

nsdMaxWorkerThreads



  
The parameter nsdMaxWorkerThreads sets the maximum number of NSD threads on an NSD server that will be concurrently transferring data with NSD clients. The default is 32 with a minimum of 8. The maximum value depends on the sum of worker1Threads + prefetchThreads + nsdMaxWorkerThreads < 1500 on 64bit architectures. This default works well in many clusters. In some cases it may help to increase nsdMaxWorkerThreads for large clusters, for example. Scale this with the number of LUNs, not the number of clients. You need this to manage flow control on the network between the clients and the servers. 

numaMemoryInterleave



  
On Linux, setting numaMemoryInterleave to yes starts mmfsd with numactl --interleave=all. Enabling this parameter may improve the performance of GPFS running on NUMA based systems, for example if the system is based on a Intel Nehalem processor. 

prefetchPct



"prefetchPct" defaults to 20% of pagepool. GPFS uses this as a guideline which limits how much pagepool space will be used for prefetch or writebehind buffers in the case of active sequential streams. The default works well for many applications. On the other hand, if the workload is mostly sequential (video serving/ingest) with very little caching of small files or random IO, then this number should be increased up to its 60% maximum, so that each stream can have more buffers available for prefetch and write behind operations. 

prefetchThreads



Tuning Guidelines

  • You usually don't need prefetchThreads to be more than twice the number of LUNs available to the node. Any more than that typically do nothing but wait in queues. The maximum value depends on the sum of worker1Threads + prefetchThreads + nsdMaxWorkerThreads < 1500 on 64bit architectures

Logfile



"Logfile" size should be larger for high metadata rate systems to prevent more glitches when the log has to wrap. Can be as large as 16MB on large blocksize file systems. To set this parameter use the --L flag on mmcrfs. 

verbsLibName

To initialize IB RDMA GPFS looks for a file called libverbs.so. If that file name is different on your system libverbs.so.1.0 , for example, you can change this parameter to match.

Example:
  mmchconfig verbsLibName=libverbs.so.1.0



verbsrdmasperconnection


This is the maximum number of RDMAs that can be outstanding on any single RDMA connection. The default value is 8.

Tuning Guidelines

  • In testing the default was more than enough on SDR. All performance testing of the parameters was done on OFED 1.1 IB SDR. 

verbsrdmaspernode


This is the maximum number of RDMAs that can be outstanding from the node. The default value is 0 (0 means default which is 32).

Tuning Guidelines

  • In testing the default was more than enough to keep adapters busy on SDR. All performance testing of the parameters was done on OFED 1.1 IB SDR. 

worker1Threads



The worker1threads parameter represents the total number of concurrent application requests that can be processed at one time. This may include metadata operations like file stat() requests, open or close and for data operations. The work1threads parameter can be reduced without having to restart the GPFS daemon. Increasing the value of worker1threads requires a restart of the GPFS daemon.
To determine whether you have a sufficient number of worker1threads configured you can use the mmfsadm dump mbcommand.

# mmfsadm dump mb | grep Worker1
  Worker1Threads: max 48 current limit 48 in use 0 waiting 0   PageDecl: max 131072 in use 0

Using the mmfsadm command you can see how many threads are "in use" and how many application requests are "waiting" for a worker1thread.

Tuning Guidelines

  • The default is good for most workloads.
  • You may want to increase worker1threads if your application uses many threads and does Asynchronous IO (AIO) or Direct IO (DIO). In these cases the worker1threads are doing the IO operations. A good place to start is to have worker1theads set to approximately 2 times the number of LUNS in the file system so GPFS can keep the disks busy with parallel requests. The maximum value depends on the sum of worker1Threads + prefetchThreads + nsdMaxWorkerThreads < 1500 on 64bit architectures
  • Do not use excessive values of worker1threads.


worker3Threads



The worker3threads parameter specifies the number of threads to use for inode prefetch. A value of zero disables inode prefetch. The Default is 8.

Tuning Guidelines

  • The default is good for most workloads.


writebehindThreshold



The writebehindThreshold parameter determines at what point GPFS starts flushing newly written data out of the pagepool for a file. Increasing this value can increase how many newly created files are kept in cache. This can be useful, for example, if your workload contains temp files that are smaller than writebehindThreshold and are deleted before they are flushed from cache. As a default, GPFS uses pagepool for buffering IO for best performance but once the data is written the buffers are cleaned, increasing this value tells GPFS to try to keep the data in the pagepool as long as practical instead of immediately cleaning the buffers.  This value is set for maximum file size to keep in cache and is specified in bytes. The default is 512k (524288 bytes). If the value is too large, there may be too many dirty buffers that the sync thread has to flush at the next sync interval causing a surge in disk IO. Keeping it small will ensure a smooth flow of dirty data to disk.

Tuning Guidelines

  • The default is good for most workloads.
  • Increase this value if you have a workload where not flushing newly written files larger than 512k would be beneficial.




블로그 이미지

Melting

,

- http://www.ibm.com/developerworks/wikis/pages/viewpage.action?pageId=104533251&navigatingVersions=true



GPFS FAQ
Added by puneetc, edited by puneetc on Mar 27, 2009
Labels: 
(None)

 You are viewing an old version (v. 53) of this page. 
The latest version is v. 60, last edited on Mar 27, 2009 (view differences | )
<< View previous version | view page history | view next version >>

GPFS Questions and Answers


Overview

General Parallel File System (TM) (GPFS (TM)) is a high performance shared-disk file management solution that provides fast, reliable access from nodes in a cluster environment. Parallel and serial applications can readily access shared files using standard UNIX(R) file system interfaces, and the same file can be accessed concurrently from multiple nodes. GPFS is designed to provide high availability through logging and replication, and can be configured for failover from both disk and server malfunctions. GPFS scalability and performance are designed to meet the needs of data intensive applications such as engineering design, digital media, data mining, relational databases, financial analytical, seismic data processing, scientific research and scalable file serving.

GPFS for POWER (TM) is supported on both AIX (R) and Linux (R). GPFS for AIX runs on the IBM (R) eServer (TM) Cluster 1600 as well as clusters of IBM Power, IBM System p (TM), IBM eServer p5, IBM BladeCenter (R) servers. GPFS for Linux runs on select IBM Power, System p, eServer p5,BladeCenter and IBM eServer OpenPower (R) servers. The GPFS Multiplatform product runs on the IBM System Cluster 1350 (TM) as well as Linux clusters based on selected IBM x86 System x (TM) rack-optimized servers, select IBM BladeCenter servers, or select IBM AMD processor-based servers.

Additionally, GPFS Multiplatform V3.2.1 is supported on nodes running Windows (R) Server 2003 R2 on 64-bit architectures (AMD x64 / EM64T) in an existing GPFS V3.2.1 cluster of AIX and/or Linux (32-bit or 64-bit) where all nodes are at service level 3.2.1-5 or later.

For further information regarding the use of GPFS in your clusters, see the GPFS: Concepts, Planning, and Installation Guide.


Questions & Answers

1. General questions:

2. Software questions:

3. Machine questions:

4. Disk questions:

5. Scaling questions:

6. Configuration and tuning questions:

7. Service questions:

1. General questions


Q1.1: How do I order GPFS?
A1.1:
To order GPFS:


Q1.2: How is GPFS priced?
A1.2:
The price for GPFS for POWER is based on the number of processors active on the server where GPFS is installed.

The price for GPFS Multiplatform is based on a Processor Value Unit metric. A Value Unit is a pricing charge metric for program license entitlements which is based upon the quantity of a specifically designated measurement used for a given program, in this case processors or processor cores. Under the processor Value Unit licensing metric, each processor core is assigned a specific number of Value Units. You must acquire the total number of processor Value Units for each processor core on which the software program is deployed. IBM continues to define a processor to be each processor core on a chip. For example, a dual-core chip contains two processor cores.

A processor core is a functional unit within a computing device that interprets and executes instructions. A processor core consists of at least an instruction control unit and one or more arithmetic or logic unit. Not all processor cores require the same number of Value Unit entitlements. With multi-core technology, each core is considered a processor.

See http://www.ibm.com/software/lotus/passportadvantage/pvu_licensing_for_customers.html

Each software program has a unique price per Value Unit. The number of Value Unit entitlements required for a program depends on how the program is deployed in your environment and must be obtained from a Value Unit table. GPFS Multiplatform is grouped into packs of 10 processor Value Units as the minimum order quantity. For example, when you need 50 processor Value Units, you will order 5 of these 10 processor Value Unit part numbers to get the required 50 processor Value Units. To determine the total cost of deploying GPFS, multiply the program price per Value Unit by the total number of processor Value Units required. To calculate the number of Value Unit entitlements required, refer to the Value Unit Table at
http://www.ibm.com/software/lotus/passportadvantage/pvu_table_for_customers.html

and the Value Unit Calculator at
https://www-112.ibm.com/software/howtobuy/passportadvantage/valueunitcalculator/vucalc.wss

For further information:

  • In the United States, please call 1-888-SHOP-IBM
  • In all other locations, please contact your IBM Marketing Representative. For a directory of worldwide contact, seehttp://www.ibm.com/planetwide/index.html


Q1.3: Where can I find the documentation for GPFS?
A1.3:
The GPFS documentation is available in both PDF and HTML format on the Cluster Information Center athttp://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.gpfs.doc/gpfsbooks.html.


Q1.4: What resources beyond the standard documentation can help me learn about and use GPFS?
A1.4:
For additional information regarding GPFS see:


Q1.5: How can I ask a more specific question about GPFS?
A1.5:
Depending upon the nature of your question, you may ask it in one of several ways.

  • If you want to correspond with IBM regarding GPFS:
    • If your question concerns a potential software error in GPFS and you have an IBM software maintenance contract, please contact 1-800-IBM-SERV in the United States or your local IBM Service Center in other countries. IBM Scholars Program users should notify the GPFS development team of potential software bugs through gpfs@us.ibm.com.
    • If you have a question that can benefit other GPFS users, you may post it to the GPFS technical discussion forum athttp://www.ibm.com/developerworks/forums/dw_forum.jsp?forum=479
    • This FAQ is continually being enhanced. To contribute possible questions or answers, please send them to gpfs@us.ibm.com
  • If you want to interact with other GPFS users, the San Diego Supercomputer Center  maintains a GPFS user mailing list. The list is gpfs-general@sdsc.edu and those interested can subscribe to the list at http://lists.sdsc.edu/mailman/listinfo/gpfs-general

If your question does not fall into the above categories, you can send a note directly to the GPFS development team at gpfs@us.ibm.com. However, this mailing list is informally monitored as time permits and should not be used for priority messages to the GPFS team.


Q1.6: Does GPFS participate in the IBM Academic Initiative Program?
A1.6:

GPFS no longer participates in the IBM Academic Initiative Program.

If you are currently using GPFS with an education license from the Academic Initiative, we will continue to support GPFS 3.2 on a best-can-do basis via email for the licenses you have. However, no additional or new licenses of GPFS will be available from the IBM Academic Initiative program. You should work with your IBM client representative on what educational discount may be available for GPFS. See http://www.ibm.com/planetwide/index.html

Back to the top of the page

2. Software questions


Q2.1: What levels of the AIX O/S are supported by GPFS?
A2.1:
GPFS supports AIX V6.1, AIX V5.3 and V5.2 nodes in a homogenous or heterogeneous cluster running either the AIX or the Linux operating system.

Table 2. GPFS for AIX

 AIX V6.1AIX V5.3AIX V5.2
GPFS V3.2XXX
GPFS V3.1 XX

Notes:

1. The following additional filesets are required by GPFS V3.2:

  • xlC.aix50.rte (C Set ++(R) Runtime for AIX 5.0), version 8.0.0.0 or later
  • xlC.rte (C Set ++ Runtime), version 8.0.0.0 or later
    These can be downloaded from Fix Central at http://www.ibm.com/eserver/support/fixes/fixcentral
    2. Enhancements to the support of Network File System (NFS) V4 in GPFS V3 are only available on AIX V5.3 systems with the minimum technology level of 5300-04 applied or on AIX V6.1 with GPFS V3.2 .
    3. The version of OpenSSL shipped with AIX V6.1 will not work with GPFS due to a change in how the library is built. To obtain the level of OpenSSL which will work with GPFS, see the question How do I get OpenSSL to work on AIX and SLES8/ppc64?
    4. For additional support information, please also see the question, What is the current service information for GPFS?
    5. Customers should consider the support plans for AIX V5.2 in their operating system decision.
    6. For the latest GPFS fix level, go to https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html


Q2.2: What Linux distributions are supported by GPFS?
A2.2:
GPFS supports the following distributions:
Note: For kernel level support, please see question What are the latest kernel levels that GPFS has been tested with?

Table 3. Linux distributions supported by GPFS

 RHEL 5 2RHEL 4RHEL 3SLES 10 1,4SLES 9SLES 8
GPFS Multiplatform V3.2XX XX 
GPFS for POWER V3.23X XX 
GPFS Multiplatform V3.1 XXXXX
GPFS for POWER V3.1 X XX 

1. There is required service for GPFS V3.1 support of SLES 10.
Please see question What is the current service information for GPFS?
2. RHEL 5.0 and later on POWER requires GPFS V3.2.0.2 or later
3. GPFS V3.2 for Linux on POWER does not support mounting of a file system with a 16KB block size
when running on RHEL 5.
4. The GPFS GPL build requires imake. If imake was not installed on the SLES 10 system,
install xorg-x11-devel-*.rpm.


Q2.3: What are the latest kernel levels that GPFS has been tested with?
A2.3:
While GPFS runs with many different AIX fixes and Linux kernel levels, it is highly suggested that customers apply the latest fix levels and kernel service updates for their operating system. To download the latest GPFS service updates, go tohttps://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html

GPFS does not currently support the following kernels:

  • RHEL hugemem kernel
  • RHEL largesmp
  • RHEL uniprocessor (UP) kernel
  • SLES xen kernel

Table 4. GPFS for Linux V3.2

Linux DistributionKernel Level
POWER 
Red Hat EL 5.3 1,2,32.6.18-128
Red Hat EL 4.72.6.9-78.0.13
SUSE Linux ES 10 SP22.6.16.60-0.27
SUSE Linux ES 9 SP42.6.5-7.312
x86_64 
Red Hat EL 5.3 2,32.6.18-128
Red Hat EL 4.72.6.9-78.0.13
SUSE Linux ES 10 SP22.6.16.60-0.27
SUSE Linux ES 9 SP42.6.5-7.312
i386 
Red Hat EL 5.3 2,32.6.18-128
Red Hat EL 4.72.6.9-78.0.13
SUSE Linux ES 10 SP22.6.16.60-0.27
SUSE Linux ES 9 SP42.6.5-7.312
Itanium (R) 2 4 
Red Hat EL 4.52.6.9-55.0.6
SUSE Linux ES 10 SP12.6.16.53-0.8
SUSE Linux ES 9 SP32.6.5-7.286

1. RHEL 5.0 and later on POWER requires GPFS V3.2.0.2 or later
2. With RHEL5.1, the automount option is slow. This issue should be addressed in the 2.6.18-53.1.4 kernel when it is available.
3. GPFS V3.2.1-3 or later supports the RHEL xen kernel.
4. GPFS for Linux on Itanium Servers is available only through a special Programming Request for Price Quotation (PRPQ). The install image is not generally available code. It must be requested by an IBM client representative through the RPQ system and approved before order fulfillment. If interested in obtaining this PRPQ, reference PRPQ # P91232 or Product ID 5799-GPS.

Table 5. GPFS for Linux V3.1

Linux DistributionKernel Level
POWER 
Red Hat EL 4.72.6.9-78.0.13
SUSE Linux ES 10 SP22.6.16.60-0.27
SUSE Linux ES 9 SP42.6.5-7.312
x86_64 
Red Hat EL 4.72.6.9-78.0.13
Red Hat EL 3.82.4.21-47.0.1
SUSE Linux ES 10 SP22.6.16.60-0.27
SUSE Linux ES 9 SP42.6.5-7.312
SUSE Linux ES 8 SP42.4.21-309
i386 
Red Hat EL 4.72.6.9-78.0.13
Red Hat EL 3.82.4.21-47.0.1
SUSE Linux ES 10 SP22.6.16.60-0.27
SUSE Linux ES 9 SP42.6.5-7.312
SUSE Linux ES 8 SP42.4.21-309


Q2.4: What levels of the Windows O/S are supported by GPFS?
A2.4:
GPFS Multiplatform V3.2.1-5 and later, is supported on nodes running Windows Server 2003 R2 on 64-bit architectures (AMD x64 / EM64T) in an existing GPFS V3.2.1 cluster of AIX and/or Linux at V3.2.1-5 or later.


Q2.5: Can different GPFS maintenance levels coexist?
A2.5:
Certain levels of GPFS can coexist, that is, be active in the same cluster and simultaneously access the same file system. This allows upgrading GPFS within a cluster without shutting down GPFS on all nodes first, and also mounting GPFS file systems from other GPFS clusters that may be running a different maintenance level of GPFS. The current maintenance level coexistence rules are:

  • All GPFS V3.2 maintenance levels can coexist with each other and with GPFS V3.1 Maintenance Level 13 or later, unless otherwise stated in this FAQ.
              See the Migration, coexistence and compatibility information in the GPFS V3.2 Concepts, Planning, and Installation Guide
    • The default file system version was incremented in GPFS 3.2.1-5. File systems created using GPFS v3.2.1.5 code without using the --version option of the mmcrfs command will not be mountable by earlier code.
    • GPFS V3.2 maintenance levels 3.2.1.2 and 3.2.1.3 have coexistence issues with other maintenance levels.
                Customers using a mixed maintenance level cluster that have some nodes running 3.2.1.2 or 3.2.1.3 and other nodes running other maintenance levels should uninstall the gpfs.msg.en_US rpm/fileset from the 3.2.1.2 and 3.2.1.3 nodes. This should prevent the wrong message format strings going across the mixed maintenance level nodes.
    • Attention: Do not use the mmrepquota command if there are nodes in the cluster running a mixture of 3.2.0.3 and other maintenance levels. A fix will be provided in APAR #IZ16367. A fix can be provided for 3.2.0.3 upon request prior to APAR availability in the March service level available at https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html
  • All GPFS V3.1 maintenance levels can coexist with each other, unless otherwise stated in this FAQ.
              Attention: GPFS V3.1 maintenance levels 10 (GPFS-3.1.0.10) thru 12 (GPFS-3.1.0.12) do not coexist with other maintenance levels
              All nodes in the cluster must conform to one of these maintenance level compatibility restrictions:
    • All nodes must be at maintenance levels 1-9 or 13 and later (GPFS-3.1.0.1 thru GPFS-3.1.0.9 or GPFS-3.1.0.13 and later)
    • All nodes must be at maintenance levels 10-12 (GPFS-3.1.0.10 - GPFS-3.1.0.12)


Q2.6: Are there any requirements for Clustered NFS (CNFS) support in GPFS V3.2?
A2.6:
GPFS V3.2 Clustered NFS (CNFS) support requirements:

The required lockd patch is not supported on RHEL 4 ppc64.

Table 6. CNFS requirements

 lockd patch requiredsm-notify requiredrpc.statd required
SLES 10XXnot required
SLES 9XXnot required
RHEL 5X (not available for ppc64)included in base distributionX
RHEL 4X (not available for ppc64)included in base distributionX

See also What Linux kernel patches are provided for clustered file systems such as GPFS?


Q2.7: Are there any requirements for the use of the Persistent Reserve support in GPFS V3.2?
A2.7:
GPFS V3.2 supports Persistent Reserve on AIX and requires:


Q2.8: Are there any considerations when utilizing the Simple Network Management Protocol (SNMP)-based monitoring capability in GPFS V3.2?
A2.8:
Considerations for the use of the SNMP-based monitoring capability in GPFS include:

  • Currently, the SNMP collector node must be a Linux node in your GPFS cluster. GPFS utilizes Net-SNMP which is not supported by AIX.
  • Support for ppc64 requires the use of Net-SNMP 5.4.1. Binaries for Net-SNMP 5.4.1 on ppc64 are not available. You will need to download the source and build the binary. Go to http://net-snmp.sourceforge.net/download.html
  • If the monitored cluster is relatively large, you need to increase the communication time-out between the SNMP master agent and the GPFS SNMP subagent. In this context, a cluster is considered to be large if the number of nodes is greater than 25, or the number of file systems is greater than 15, or the total number of disks in all file systems is greater than 50. For more information see Configuring Net-SNMP in the GPFS: Advanced Administration Guide.

Back to the top of the page

3. Machine questions


Q3.1: What are the minimum hardware requirements for a GPFS cluster?
A3.1:
The minimum hardware requirements are:

  • GPFS for POWER: IBM POWER3(TM) or newer processor, 1 GB of memory
  • GPFS Multiplatform for Linux:
    • Intel(R) Pentium(R) 3 or newer processor, with 512 MB of memory
    • AMD Opteron(TM) processors, with 1 GB of memory
    • Intel Itanium 2 processor with 1 GB of RAM1
  • GPFS Multiplatform for Windows:
    • Intel EM64T processors, with 1GB of memory
    • AMD Opteron processors, with 1 GB of memory
      Note: Due to issues found during testing, GPFS for Windows is not supported on e325 servers

Additionally, it is highly suggested that a sufficiently large amount of swap space is configured. While the actual configuration decisions should be made taking into account the memory requirements of other applications, it is suggested to configure at least as much swap space as there is physical memory on a given node.

GPFS is supported on systems which are listed in, or compatible with, the IBM hardware specified in the Hardware requirements section of the Sales Manual for GPFS. If you are running GPFS on hardware that is not listed in the Hardware Requirements, should problems arise and after investigation it is found that the problem may be related to incompatibilities of the hardware, we may require reproduction of the problem on a configuration conforming to IBM hardware listed in the sales manual.

To access the Sales Manual for GPFS:

1. Go to http://www-306.ibm.com/common/ssi/OIX.wss
2. From A specific type menu, choose HW&SW Desc (Sales Manual,RPQ).
3. To view a GPFS sales manual, choose the corresponding product number to enter in the keyword field then click on Go

  • For General Parallel File System for POWER V3.2.1, enter 5765-G66
  • For General Parallel File System Multiplatform V3.2.1, enter 5724-N94
  • For General Parallel File System for AIX 5L V3.1, enter 5765-G66
  • For General Parallel File System for Linux on POWER V3.1, enter 5765-G67
  • For General Parallel File System Multiplatform V3.1 for Linux, enter 5724-N94


Q3.2: Is GPFS for POWER supported on IBM System i servers?
A3.2:
GPFS for POWER extends all features, function, and restrictions (such as operating system and scaling support) to IBM System i servers to match their IBM System p counterparts:

Table 7.

IBM System iIBM System p
i-595p5-595
i-570p5-570, p6-570
i-550p5-550
i-520p5-520

No service updates are required for this additional support.


Q3.3: What machine models has GPFS for Linux been tested with?
A3.3:
GPFS has been tested with:

For both the p5-590 and the p5-595: See the question What is the current service information for GPFS?

For hardware and software certification, please see the IBM ServerProven site at http://www.ibm.com/servers/eserver/serverproven/compat/us/


Q3.4: Is GPFS for Linux supported on all IBM ServerProven servers?
A3.4:
GPFS for Linux is supported on all IBM ServerProven servers:

  1. With the distributions and kernel levels as listed in the question What are the latest distributions and kernel levels that GPFS has been tested with?
  2. That meet the minimum hardware model requirements as listed in the question What are the minimum hardware requirements for a GPFS cluster?
    Please see the IBM ServerProven site at http://www.ibm.com/servers/eserver/serverproven/compat/us/


Q3.5: What interconnects are supported for GPFS daemon-to-daemon communication in a GPFS cluster?
A3.5:
The interconnect for GPFS daemon-to-daemon communication depends upon the types of nodes in your cluster.

Table 8. GPFS daemon -to-daemon communication interconnects

     Nodes in your cluster         Supported interconnectSupported environments
Linux  
 EthernetAll supported GPFS environments
 10-Gigabit EthernetAll supported GPFS environments
 MyrinetIP only
 InfiniBand
Linux/AIX/Windows  
 EthernetAll supported GPFS environments
 10-Gigabit Ethernet
  • All supported GPFS Linux environments
  • AIX V5.3
  • AIX V6.1
AIX  
 EthernetAll supported GPFS environments
 10-Gigabit Ethernet
  • AIX V5.3
  • AIX V6.1
 MyrinetAIX V5.2 and V5.3 64-bit kernel 
BladeCenter JS20 and p5 POWER5 servers 
IP only
 InfiniBandAIX V5.3 
GPFS V3.1 or V3.2 
IP only
 eServer HPSHomogenous clusters of either AIX V5.2 or V5.3


Q3.6: Does GPFS support exploitation of the Virtual I/O Server (VIOS) features of POWER5 processors?
A3.6:
Yes, GPFS allows exploitation of POWER5 VIOS configurations. Both the virtual SCSI (VSCSI) and the shared Ethernet adapter (SEA) are supported in single and multiple central electronics complex (CEC) configurations. This support is limited to GPFS nodes that are using either the AIX V6.1 or V5.3 operating system.

All LPARs in a CEC that are GPFS cluster members must have all the VIO disks mapped to each LPAR using virtual SCSI. This creates to GPFS a SAN environment where each node has access to disk on a local path without requiring network access. All of the NSD's in these configurations must not be coded with any NSD server associated with them.

Minimum required code levels:

  • VIOS Release 1.3.0.0 Fix Pack 8
  • AIX 5L V5.3 Service Pack 5300-05-01

There is no GPFS fix level requirement for this support, but it is recommended that you be at the latest GPFS level available. For information on the latest levels of GPFS go to https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html

For further information on POWER5 VIOS go to http://techsupport.services.ibm.com/server/vios/documentation/faq.html

For VIOS documentation, go to http://techsupport.services.ibm.com/server/vios/documentation/home.html

Back to the top of the page

4. Disk questions


Q4.1: What disk hardware has GPFS been tested with?
A4.1:
This set of tables displays the set of disk hardware which has been tested by IBM and known to work with GPFS. GPFS is not limited to only using this set of disk devices. Other disk devices may work with GPFS but they have not been tested by IBM. The GPFS support team will help customers who are using devices outside of this list of tested devices, to solve problems directly related to GPFS, but not problems deemed to be issues with the underlying device's behavior including any performance issues exhibited on untested hardware.

It is important to note that:

  • Each individual disk subsystem requires a specific set of device drivers for proper operation while attached to a host running GPFS or IBM Recoverable Virtual Shared Disk. The prerequisite levels of device drivers are not documented in this GPFS-specific FAQ. Refer to the disk subsystem's web page to determine the currency of the device driver stack for the host's operating system level and attachment configuration.

          For information on IBM disk storage subsystems and their related device drivers levels and Operating System support guidelines, go tohttp://www.ibm.com/servers/storage/support/disk/index.html

  • Microcode levels should be at the latest levels available for your specific disk drive.

          For the IBM System Storage (TM), go to http://www.ibm.com/servers/storage/support/allproducts/downloading.html

  • GPFS for Windows can only operate as an NSD client at this time, and as such does not support direct attached disks.

DS4000 customers: Please also see

Table 9. Disk hardware tested with GPFS for AIX on POWER

       GPFS for AIX on POWER:        
 IBM System Storage DS6000 (TM) using either Subsystem Device Driver (SDD) or Subsystem Device Driver Path Control Module (SDDPCM) 

Configuration considerations: GPFS clusters up to 32 nodes are supported and require a firmware level of R9a.5b050318a or greater. See further requirements below.
 IBM System Storage DS8000 (TM) using either SDD or SDDPCM 

Configuration considerations: GPFS clusters up to 32 nodes are supported and require a firmware level of R10k.9b050406 or greater. See further requirements below.
 DS6000 and DS8000 service requirements: 

  • AIX 5L V5.2 maintenance level 05 (5200-05) - APAR # IY68906, APAR # IY70905
  • AIX 5L V5.3 maintenance level 02 (5300-02) - APAR # IY68966, APAR # IY71085
  • GPFS for AIX 5L V2.3 - APAR # IY66584, APAR # IY70396, APAR # IY71901 

    For the Disk Leasing model install the latest supported version of the SDD fileset supported on your operating system. 

    For the Persistent Reserve model install the latest supported version of SDDPCM fileset supported for your operating system.
 IBM TotalStorage DS4100 (Formerly FAStT 100) with DS4000 EXP100 Storage Expansion Unit with Serial Advanced Technology Attachment (SATA) drives. 

IBM TotalStorage FAStT500 

IBM System Storage DS4200 Express all supported expansion drawer and disk types 

IBM System Storage DS4300 (Formerly FAStT 600) with DS4000 EXP710 Fibre Channel (FC) Storage Expansion Unit, DS4000 EXP700 FC Storage Expansion Unit, or EXP100 

IBM System Storage DS4300 Turbo with EXP710, EXP700, or EXP100 

IBM System Storage DS4400 (Formerly FAStT 700) with EXP710 or EXP700 

IBM System Storage DS4500 (Formerly FAStT 900) with EXP710, EXP700, or EXP100 

IBM System Storage DS4700 Express all supported expansion drawer and disk types 

IBM System Storage DS4800 with EXP710, EXP100 or EXP810 

IBM System Storage DS3400 (1726-HC4)
 IBM TotalStorage ESS (2105-F20 or 2105-800 with SDD

IBM TotalStorage ESS (2105-F20 or 2105-800 using AIX 5L Multi-Path I/O (MPIO) and SDDPCM))
 IBM System Storage Storage Area Network (SAN) Volume Controller (SVC) V2.1 and V3.1 

The following APAR numbers are suggested: 

  • IY64709 - Applies to all GPFS clusters
  • IY64259 - Applies only when running GPFS in an AIX V5.2 or V5.3 environment with RVSD 4.1
  • IY42355 - Applies only when running GPFS in a PSSP V3.5 environment
  • SVC V2.1.0.1 is supported with AIX 5L V5.2 (Maintenance Level 05) and AIX 5L V5.3 (Maintenance Level 01). 

    See http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002471 for specific advice on SAN Volume Controller recommended software levels.
 IBM 7133 Serial Disk System (all disk sizes)
 Hitachi Lightning 9900 (TM) (9910, 9960, 9970 and 9980 
Hitachi Universal Storage Platform 100/600/1100 
Notes: 
  1. In all cases Hitachi Dynamic Link Manager(TM) (HDLM) (multipath software) or MPIO (default PCM - failover only) isrequired
  2. AIX ODM objects supplied by Hitachi Data Systems (HDS) are required for all above devices.
  3. Customers should consult with HDS to verify that their proposed combination of the above components is supported by HDS.
 EMC Symmetrix DMX Storage Subsystems (FC attach only) 

Selected models of CX/CX-3 family including CX300, CX400, CX500 CX600, CX700 and CX3-20, CX3-40 and CX3-80 

Device driver support for Symmetrix includes both MPIO and PowerPath. 

Note:
CX/CX-3 requires PowerPath. 

Customers should consult with EMC to verify that their proposed combination of the above components is supported by EMC.
 HP XP 128/1024 XP10000/12000 

HP StorageWorks Enterprise Virtual Arrays (EVA) 4000/6000/8000 and 3000/5000 models that have bee upgraded to active-active configurations 

Note: 
HDLM multipath software is required
 IBM DCS9550 (either FC or SATA drives) 
FC attach only 
Minimum firmware 3.08b 
Must use IBM supplied ODM objects at level 1.7 or greater 

For more information on the DCS9550 go to http://www.datadirectnet.com/dcs9550/

Table 10. Disk hardware tested with GPFS for Linux on x86 xSeries servers

       GPFS for Linux on xSeries servers:    
 IBM XIV 2810 
Minimum Firmware Level: 10.0.1 

This storage subsystem has been tested on
 IBM TotalStorage FAStT 200 Storage Server 

IBM TotalStorage FAStT 500 

IBM TotalStorage DS4100 (Formerly FAStT 100) with EXP100 

IBM System Storage DS4200 Express all supported expansion drawer and disk types 

IBM System Storage DS4300 (Formerly FAStT 600) with EXP710, EXP700, or EXP100 

IBM System Storage DS4300 Turbo with EXP710, EXP700, or EXP100 

IBM System Storage DS4400 (Formerly FAStT 700) with EXP710 or EXP700 

IBM System Storage DS4500 (Formerly FAStT 900) with EXP710, EXP700, or EXP100 

IBM System Storage DS4700 Express all supported expansion drawer and disk types 

IBM System Storage DS4800 with EXP710, EXP100 or EXP810 

IBM System Storage DS3400 (1726-HC4)
 IBM TotalStorage Enterprise Storage Server (R) (ESS) models 2105-F20 and 2105-800, with Subsystem Device Driver (SDD)
 EMC Symmetrix Direct Matrix Architecture (DMX) Storage Subsystems 1000 with PowerPath v 3.06 and v 3.07
 IBM System Storage Storage Area Network (SAN) Volume Controller (SVC) V2.1 and V3.1 

See http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002471 for specific advice on SAN Volume Controller recommended software levels.
 IBM DCS9550 (either FC or SATA drives) 
FC attach only 
minimum firmware 3.08b 
QLogic drivers at 8.01.07 or newer and IBM SAN Surfer V5.0.0 or newer 
http://support.qlogic.com/support/oem_detail_all.asp?oemid=376



For more information on the DCS9550 go to http://www.datadirectnet.com/dcs9550/

Restrictions:IBM ServeRAID(TM) adapters are not supported.

Table 11. Disk hardware tested with GPFS for Linux on POWER

GPFS for Linux on POWER: 
 IBM System Storage DS4200 Express all supported expansion drawer and disk types 

IBM System Storage DS4300 (Formerly FAStT 600) all supported drawer and disk types 

IBM System Storage DS4500 (Formerly FAStT 900) all supported expansion drawer and disk types 

IBM System Storage DS4700 Express all supported expansion drawer and disk types 

IBM System Storage DS4800 all supported expansion drawer and disk types
 IBM System Storage DS8000 using SDD

Table 12. Disk hardware tested with GPFS for Linux on AMD processor-based servers

GPFS for Linux on eServer AMD processor-based servers:No devices tested specially in this environment.


Q4.2: What Fibre Channel Switches are qualified for GPFS usage and is there a FC Switch support chart available?
A4.2:
There are no special requirements for FC switches used by GPFS other than the switch must be supported by AIX or Linux. For further information seehttp://www.storage.ibm.com/ibmsan/index.html


Q4.3: Can I concurrently access SAN-attached disks from both AIX and Linux nodes in my GPFS cluster?
A4.3:
The architecture of GPFS allows both AIX and Linux hosts to concurrently access the same set of LUNs. However, before this is implemented in a GPFS cluster you must ensure that the disk subsystem being used supports both AIX and Linux concurrently accessing LUNs. While the GPFS architecture allows this, the underlying disk subsystem may not, and in that case, a configuration attempting it would not be supported.


Q4.4: What disk support failover models does GPFS support for the IBM System Storage DS4000 family of storage controllers with the Linux operating system?
A4.4:
GPFS has been tested with both the Host Bus Adapter Failover and Redundant Dual Active Controller (RDAC) device drivers.

To download the current device drivers for your disk subsystem, please go to http://www.ibm.com/servers/storage/support/


Q4.5: 4.5 What devices have been tested with SCSI-3 Persistent Reservations?
A4.5:
The following devices have been tested with SCSI-3 Persistent Reservations:

  • DS8000 (all 2105 and 2107 models) using SDDPCM or the default MPIO PCM on AIX.
  • DS4000 subsystems using the IBM RDAC driver on AIX. (devices.fcp.disk.array.rte)

The most recent versions of the device drivers are always recommended to avoid problems that have been addressed.

Note: For a device to properly offer SCSI-3 Persistent Reservation support for GPFS, it must support SCSI-3 PERSISTENT RESERVE IN with a service action ofREPORT CAPABILITIES. The REPORT CAPABILITIES must indicate support for a reservation type of Write Exclusive All Registrants. Contact the disk vendor to determine these capabilities.


Q4.6: Are there any special considerations when my cluster consists of two nodes?
A4.6:
Customers who previously used single-node quorum and are migrating to a supported level of GPFS, must be aware that the single-node quorum function has been replaced with node quorum with tiebreaker disks. The new node quorum with tiebreaker disks support does not depend upon the availability of SCSI-3 persistent reserve. All disks tested with GPFS can now utilize node quorum with tiebreaker disks as opposed to GPFS node quorum (one plus half of the explicitly defined quorum nodes in the GPFS cluster). For further information, see the GPFS: Concepts, Planning, and Installation Guide for your level of GPFS.

Back to the top of the page

5. Scaling questions


Q5.1: What are the GPFS cluster size limits?
A5.1:
The current maximum tested GPFS cluster size limits are:

Table 13. GPFS maximum tested cluster sizes

GPFS Multiplatform for Linux2441 nodes
GPFS on POWER for AIX1530 nodes
GPFS Multiplatform for Windows64 nodes
Note:Please contact gpfs@us.ibm.com if you intend to exceed: 

  1. Configurations with Linux larger than 512 nodes 
  2. Configurations with AIX larger than 128 nodes 
  3. Configurations with Windows larger than 32 nodes 

Although GPFS is typically targeted for a cluster with multiple nodes, it can also provide high performance benefit for a single node so there is no lower limit. However, there are two points to consider:

  • GPFS is a well-proven, scalable cluster file system. For a given I/O configuration, typically multiple nodes are required to saturate the aggregate file system performance capability. If the aggregate performance of the I/O subsystem is the bottleneck, then GPFS can help achieve the aggregate performance even on a single node.
  •  GPFS is a highly available file system. Therefore, customers who are interested in single-node GPFS often end up deploying a multi-node GPFS cluster to ensure availability.2


Q5.2: What are the current file system size limits?
A5.2:
The current file system size limits are:

Table 14. Current file system size limits

GPFS 2.3 or later, file system architectural limit2^99 bytes
GPFS 2.2 file system architectural limit2^51 bytes (2 Petabytes)
Current tested limitApproximately 2 PB
Note:Contact gpfs@us.ibm.com if you intend to exceed 200 Terabytes


Q5.3: What is the current limit on the number of mounted file systems in a GPFS cluster?
A5.3:
The total number of mounted file systems within a GPFS cluster depends upon your service level of GPFS:

Table 15. Total number of mounted file systems

GPFS Service LevelNumber of mounted file systems
GPFS V3.2.0.1 or later256
GPFS V3.1.0.5 or later64
GPFS V3.1.0.1 thru V3.1.0.432


Q5.4: What is the architectural limit of the number of files in a file system?
A5.4:
The architectural limit of the number of files in a file system is determined by the file system format. For file systems created prior to GPFS V2.3, the limit is 268,435,456. For file systems created with GPFS V2.3 or later, the limit is 2,147,483,648. Please note that the effective limit on the number of files in a file system is usually lower than the architectural limit, and could be adjusted using the -F option of the mmchfs command.


Q5.5: What is the current limit on the number of nodes that may concurrently join a cluster?
A5.5:
The total number of nodes that may concurrently join a cluster depends upon the level of GPFS which you are running:

  • GPFS V3.2 is limited to a maximum of 8192 nodes.
  • GPFS V3.1 is limited to a maximum of 4096 nodes.

A node joins a given cluster if it is:

  • A member of the local GPFS cluster (the mmlscluster command output displays the local cluster nodes).
  • A node in a different GPFS cluster that is mounting a file system from the local cluster.

For example:

  • GPFS clusterA has 2100 member nodes as listed in the mmlscluster command.
  • 500 nodes from clusterB are mounting a file system owned by clusterA.

clusterA therefore has 2600 concurrent nodes.


Q5.6: What are the limitations on GPFS disk size?
A5.6:
The maximum disk size supported by GPFS depends on the file system format and the underlying device support. For file systems created prior to GPFS version 2.3, the maximum disk size is 1 TB due to internal GPFS file system format limitations. For file systems created with GPFS 2.3 or later, these limitations have been removed, and the maximum disk size is only limited by the OS kernel and device driver support:

Table 16. Disk size limitations

OS kernelMaximum supported GPFS disk size
AIX, 64-bit kernel>2TB, up to the device driver limit
AIX, 32-bit kernel1TB
Linux 2.6 64-bit kernels>2TB, up to the device driver limit
Linux 2.6 32-bit kernels, Linux 2.42TB

Notes:

  1. The above limits are only applicable to nodes that access disk devices through a local block device interface, as opposed to NSD protocol. For NSD clients, the maximum disk size is only limited by the NSD server large disk support capability, irrespective of the kernel running on an NSD client node.
  2. The basic reason for the significance of the 2TB disk size barrier is that this is the maximum disk size that can be addressed using 32-bit sector numbers and 512-byte sector size. A larger disk can be addressed either by using 64-bit sector numbers or by using larger sector size. GPFS uses 64-bit sector numbers to implement large disk support. Disk sector sizes other than 512 bytes are unsupported.
  3. GPFS for Windows can only operate as an NSD client at this time, and as such does not support direct attached disks.

Back to the top of the page

6. Configuration and tuning questions


Q6.1: What specific configuration and performance tuning suggestions are there?
A6.1:
In addition to the configuration and performance tuning suggestions in the GPFS: Concepts, Planning, and Installation Guide for your version of GPFS:

  • If your GPFS cluster is configured to use SSH/SCP, it is suggested that you increase the value of MaxStartups in sshd_config to at least 1024.
  • You must ensure that when you are designating nodes for use by GPFS you specify a non-aliased interface. Utilization of aliased interfaces may produce undesired results. When creating or adding nodes to your cluster, the specified hostname or IP address must refer to the communications adapter over which the GPFS daemons communicate. When specifying servers for your NSDs, the output of the mmlscluster command lists the hostname and IP address combinations recognized by GPFS. Utilizing an aliased hostname not listed in the mmlscluster command output may produce undesired results.
  • If your system consists of the eServer pSeries High Performance Switch, it is suggested that you configure GPFS over the ml0 IP network interface.
  • On systems running with the Linux 2.6 kernel, it is recommended you adjust the vm.min_free_kbytes kernel tunable. This tunable controls the amount of free memory that Linux kernel keeps available (i.e. not used in any kernel caches). When vm.min_free_kbytes is set to its default value, on some configurations it is possible to encounter memory exhaustion symptoms when free memory should in fact be available. Setting vm.min_free_kbytes to a higher value (Linux sysctl utility could be used for this purpose), on the order of magnitude of 5-6% of the total amount of physical memory, should help to avoid such a situation. 

             Also, please see the GPFS Redpapers:


Q6.2: What configuration and performance tuning suggestions are there for GPFS when used primarily for Oracle databases?
A6.2:
Note: Only a subset of GPFS releases are certified for use in Oracle environments. For the latest status of GPFS certification:

  • For AIX go to, http://www.oracle.com/technology/products/database/clustering/certify/tech_generic_unix_new.html
  • For Linux go to, http://www.oracle.com/technology/products/database/clustering/certify/tech_generic_linux_new.html 

             In addition to the performance tuning suggestions in the GPFS: Concepts, Planning, and Installation Guide for your version of GPFS: 

  • When running Oracle RAC 10g, it is suggested you increase the value for OPROCD_DEFAULT_MARGIN to at least 500 to avoid possible random reboots of nodes. 

             In the control script for the Oracle CSS daemon, located in /etc/init.cssd the value for OPROCD_DEFAULT_MARGIN is set to 500 (milliseconds) on all UNIX derivatives except for AIX. For AIX this value is set to 100. From a GPFS perspective, even 500 milliseconds maybe too low in situations where node failover may take up to a minute or two to resolve. However, if during node failure the surviving node is already doing direct IO to the oprocd control file, it should have the necessary tokens and indirect block cached and should therefore not have to wait during failover.
  • Using the IBM General Parallel File System is attractive for RAC environments because executables, trace files and archive log files are accessible on all nodes. However, care must be taken to properly configure the system in order to prevent false node evictions, and to maintain the ability to perform rolling upgrades of the Oracle software. Without proper configuration GPFS recovery from a node failure can interfere with cluster management operations resulting in additional node failures. 

             If you are running GPFS and Oracle RAC 10gR2 and encounter false node evictions:
    • Upgrade the CRS to 10.2.0.3 or newer. 

               The Oracle 10g Clusterware (CRS) executables or logs (the CRS_HOME) should be placed on a local JFS2 filesystem. Using GPFS for the CRS_HOME can inhibit CRS functionality on the surviving nodes while GPFS is recovering from a failed node for the following reasons: 

      • In Oracle 10gR2, up to and including 10.2.0.3, critical CRS daemon executables are not pinned in memory. Oracle and IBM are working to improve this in future releases of 10gR2.
      • Delays in updating the CRS log and authorization files while GPFS is recovering can interfere with CRS operations.
      • Due to an Oracle 10g limitation rolling upgrades of the CRS are not possible when the CRS_HOME is on a shared filesystem.
    • CSS voting disks and the Oracle Clusterware Registry (OCR) should not be placed on GPFS as the IO freeze during GPFS reconfiguration can lead to node eviction, and the inability of CRS to function. Place the OCR and Voting disk on shared raw devices (hdisks).
    • Oracle Database 10g (RDBMS) executables are supported on GPFS for Oracle RAC 10g. However, the system should be configured to support multiple ORACLE_HOME's so as to maintain the ability to perform rolling patch application. Rolling patch application is supported for the ORACLE_HOME starting in Oracle RAC 10.2.0.3. 

    • Oracle Database 10g data files, trace files, and archive log files are supported on GPFS.

See also:


Q6.3: Are there any considerations when utilizing the Remote Direct Memory Access (RDMA) offered by InfiniBand?
A6.3:
GPFS Multiplatform V3.2 for Linux supports Infiniband RDMA in the following configurations:

Notes:

  1. Ensure you are at the latest firmware level for both your switch and adapter.
  2. See the question What are the current GPFS advisories ? 

  • SLES 10 or RHEL 5, x86_64
  • OFED Infiniband Stack VERBS API - GEN 2
    • OFED 1.2, OFED 1.2.5, OFED 1.3
    • OFED 1.1 - Voltaire Gridstack only
  • Mellanox based adapters
    • RDMA over multiple HCAs/Ports/QPs
    • For multiple ports - GPFS balances load across ports
  • Single IB subnet
    • QPs connected via GPFS RPC
  • RDMA support for Mellanox memfree adapters requires GPFS V3.2.0.2, or later, to operate correctly


Q6.4: What Linux configuration settings are required when NFS exporting a GPFS filesystem?
A6.4:
If you are running at SLES 9 SP 1, the kernel defines the sysctl variable fs.nfs.use_underlying_lock_ops that determines if the NFS lockd is to consult the file system when granting advisory byte-range locks. For distributed file systems like GPFS, this must be set to true (the default is false).

You can query the current setting by issuing the command:

sysctl fs.nfs.use_underlying_lock_ops

Alternatively, the record fs.nfs.use_underlying_lock_ops = 1 may be added to /etc/sysctl.conf. This record must be applied after initially booting the node and after each reboot by issuing the command:

sysctl -p

As the fs.nfs.use_underlying_lock_ops variable is currently not available in SLES 9 SP2 or later, when NFS exporting a GPFS file system ensure your NFS server nodes are at the SP1 level (until such time the variable is made available in later service packs).

For additional considerations when NFS exporting your GPFS file system, see the:


Q6.5: Sometimes GPFS appears to be handling a heavy I/O load, for no apparent reason. What could be causing this?
A6.5:
On some Linux distributions the system is configured by default to run the file system indexing utility updatedb through the cron daemon on a periodic basis (usually daily). This utility traverses the file hierarchy and generates a rather extensive amount of I/O load. For this reason, it is configured by default to skip certain file system types and nonessential file systems. However, the default configuration does not prevent updatedb from traversing GPFS file systems.

In a cluster this results in multiple instances of updatedb traversing the same GPFS file system simultaneously. This causes general file system activity and lock contention in proportion to the number of nodes in the cluster. On smaller clusters, this may result in a relatively short-lived spike of activity, while on larger clusters, depending on the overall system throughput capability, the period of heavy load may last longer. Usually the file system manager node will be the busiest, and GPFS would appear sluggish on all nodes. Re-configuring the system to either make updatedb skip all GPFS file systems or only index GPFS files on one node in the cluster is necessary to avoid this problem.


Q6.6: What considerations are there when using IBM Tivoli Storage Manager with GPFS?
A6.6:
Consideration when using Tivoli Storage Manager (TSM) with GPFS include:


Q6.7: How do I get OpenSSL to work on AIX and SLES8/ppc64?
A6.7:
To help enhance the security of mounts using Secure Sockets Layer (SSL) a working version of OpenSSL must be installed. This version must be compiled with support for the Secure Hash Algorithm (SHA).

  1. GPFS APAR IZ21177 is required.
  2. GPFS configuration needs to be changed to point at the right set of libraries:
    • On 64-bit kernel:

mmchconfig openssllibname="/usr/lib/libssl.a(libssl64.so.0.9.8)" -N AffectedNodes

    • On 32-bit kernel:

mmchconfig openssllibname="/usr/lib/libssl.a(libssl.so.0.9.8)" -N AffectedNodes

  • On AIX V5.1, OpenSSL 0.9.7d-2, or later, as distributed by IBM in the AIX Toolbox for Linux Applications, is supported. To download OpenSSL from the AIX Toolbox for Linux Applications:
  1. Go to http://www-03.ibm.com/systems/p/os/aix/linux/toolbox/download.html
  2. Under Sorted download, click on AIX Toolbox Cryptographic Content.
  3. Either register for an IBM ID or sign-in.
  4. To view the license agreement, click on View license.
  5. By clicking I agree you agree that you have had the opportunity to review the terms and conditions and that such terms and conditions govern this transaction.
  6. Scroll down to OpenSSL – SSL Cryptographic Libraries
  7. Ensure you download 0.9.7d-2 or later
  • For the supported versions of Linux:
    • For the Red Hat EL 3, Red Hat EL 4, Red Hat EL 5, SUSE Linux ES 9 and SUSE Linux ES 10 distributions, GPFS supports the version that comes with your distribution.
    • For the SUSE Linux ES 8 distribution on x86, this is currently OpenSSL 0.9.6, as included with your distribution.
    • For SUSE Linux ES 8 for PowerPC64 you must compile and install OpenSSL version 0.9.7f, according to these directions, before mounting any GPFS file systems that belong to other GPFS clusters (If you are running GPFS V2.3, ensure you are at least at the minimum service level. See the question What is the current service information for GPFS?):
  1. Download the file openssl-0.9.7f.tar.gz, or later, from http://www.openssl.org.
  2. Unpack the file openssl-0.9.7f.tar.gz

    tar xfz openssl-0.9.7f.tar.gz
    cd openssl-0.9.7f 

  3. Edit the script Configure , changing gcc to /opt/cross/bin/powerpc64-linux-gcc

    398c398
    < "linux-ppc64", "gcc:-bpowerpc64-linux -DB_ENDIAN -DTERMIO -O3 -fomit-frame-pointer
    -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_RISC1
    DES_UNROLL:asm/linux_ppc64.o:::::::::dlfcn:linux-shared:-fPIC:-bpowerpc64-linux:.so.
    \$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
    ---
    > "linux-ppc64", "/opt/cross/bin/powerpc64-linux-gcc:-bpowerpc64-linux -DB_ENDIAN 
    -DTERMIO -O3 -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG 
    RC4_CHAR RC4_CHUNK DES_RISC1
    DES_UNROLL:asm/linux_ppc64.o:::::::::dlfcn:linux-shared:-fPIC:-bpowerpc64-linux:.so.
    \$(SHLIB_MAJOR).\$(SHLIB_MINOR)", 

  4. Run this script: 

    ./Configure --prefix=/usr/local/ linux-ppc64 

  5. Build and install the OpenSSL library: 

    make
    make install 

  6. Update the library cache: 

    ldconfig 

  7. Configure all of the PowerPC64 nodes in the GPFS cluster, listed in the file PPC64nodes, to use the edited library: 

    mmchconfig openssllibname=/usr/local/lib/libssl.so.0.9.7 -N PPC64nodes 


Q6.8: What ciphers are supported for use by GPFS?
A6.8:
You can specify any of the RSA based ciphers that are supported by the OpenSSL version installed on the node. Refer to the ciphers(1) man page for a list of the valid cipher strings and their meaning. Use the openssl ciphers command to display the list of available ciphers:

openssl ciphers RSA

In addition, GPFS supports the keyword AUTHONLY. When AUTHONLY is specified in place of a cipher list, GPFS checks network connection authorization. However, data sent over the connection is not protected

Note: When different versions of OpenSSL are used within a cluster or in a multi-cluster setup, ensure that the ciphers are supported by all versions.


Q6.9: When I allow other clusters to mount my file systems, is there a way to restrict access permissions for the root user?
A6.9:
Yes. A root squash option is available when making a file system available for mounting by other clusters using the mmauth command. This option is similar in spirit to the NFS root squash option. When enabled, it causes GPFS to squash superuser authority on accesses to the affected file system on nodes in remote clusters.

This is accomplished by remapping the credentials: user id (UID) and group id (GID) of the root user, to a UID and GID specified by the system administrator on the home cluster, for example, the UID and GID of the user nobody. In effect, root squashing makes the root user on remote nodes access the file system as a non-privileged user.

Although enabling root squash is similar in spirit to setting up UID remapping (see http://www.ibm.com/servers/eserver/clusters/whitepapers/uid_gpfs.html), there are two important differences:

  1. While enabling UID remapping on remote nodes is an option available to the remote system administrator, root squashing need only be enabled on the local cluster, and it will be enforced on remote nodes.
  2. While UID remapping requires having an external infrastructure for mapping between local names and globally unique names, no such infrastructure is necessary for enabling root squashing.

When both UID remapping and root squashing are enabled, root squashing overrides the normal UID remapping mechanism for the root user. See the mmauth command man page for further details.

Back to the top of the page

7. Service questions


Q7.1: What support services are available for GPFS?
A7.1:
Support services for GPFS include:

         A 24x7 enterprise-level remote support for problem resolution and defect support for major distributions of the Linux operating system. Go tohttp://www.ibm.com/services/us/index.wss/so/its/a1000030.

  • IBM Systems and Technology Group Lab Services

         IBM Systems and Technology Group (STG) Lab Services can help you optimize the utilization of your data center and system solutions.

         STG Lab Services has the knowledge and deep skills to support you through the entire information technology race. Focused on the delivery of new technologies and niche offerings, STG Lab Services collaborates with IBM Global Services and IBM Business Partners to provide complementary services that will help lead through the turns and curves to keep your business running at top speed.

         Go to http://www.ibm.com/systems/services/labservices/.

  • Subscription service for pSeries, p5, and OpenPower

         This service provides technical information for IT professionals who maintain pSeries, p5 and OpenPower servers. Subscribe athttp://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd

  • GPFS software maintenance

         GPFS defect resolution for current holders of IBM software maintenance contracts:

    • In the United States contact us toll free at 1-800-IBM-SERV (1-800-426-7378)
    • In other countries, contact your local IBM Service Center

         Contact gpfs@us.ibm.com for all other services or consultation on what service is best for your situation.


Q7.2: What is the current service information for GPFS?
A7.2:
The current GPFS service information includes:

  • For GPFS v3.1, if there are foreign characters in file or directory names, the mmapplypolicy command may fail
    GPFS: 6027-902 Error parsing work file /tmp/tsmigrate.
    inodeslist.<pid>
             The workaround for this problem is to:
    • Upgrade to GPFS v3.2 where this problem no longer exists.
    • If you need to stay on GPFS v3.1:
  1. Install GNU sort contained in the GNU coreutils from the AIX Toolbox for Linux Applications at http://www-03.ibm.com/systems/p/os/aix/linux/toolbox/download.html
  2. Set the environment variables 

    {{MM_SORT_CMD = "LC_ALL=C }}
    /local-or-opts-wherever-gnu-binaries-happen-to-be/sort -z"
    MM_SORT_EOR = "" #empty string 

  • For GPFS V3.2 use with AIX V6.1:
    • GPFS is supported in a Ethernet/10-Gigabit Ethernet environment, see the question What interconnects are supported for GPFS daemon-to-daemon communication in my GPFS cluster?
    • The versions of OpenSSL shipped as part of the AIX Expansion Pack, 0.9.8.4 and 0.9.8.41, ARE NOT compatible with GPFS due to the way the OpenSSL libraries are built. To obtain the level of OpenSSL which will work with GPFS, see the question How do I get OpenSSL to work on AIX and SLES8/ppc64?
    • Role Based Access Control (RBAC) is not supported by GPFS and is disabled by default.
    • Workload Partitions (WPARs) or storage protection keys are not exploited by GPFS.
  • If you get errors on RHEL5 when trying to run GPFS self-extractor archive from the installation media, please run export _POSIX2_VERSION=199209 first.
  • When installing or migrating GPFS, the minimum levels of service you must have applied are:
    • GPFS V3.2 you must apply APAR IY99639 (GPFS V3.2.0-1)
    • GPFS V3.1 you must apply APAR IY82778
    • GPFS V2.3 you must apply APAR IY63969 

               If you do not apply these levels of service and you attempt to start GPFS, you will receive an error message similar to: 

      mmstartup: Required service not applied. Install GPFS 3.2.0.1 or later
      mmstartup: Command failed Examine previous error messages to determine cause 

               Upgrading GPFS to a new major release on Linux: 

               When migrating to a new major release of GPFS (for example, GPFS 3.1 to GPFS 3.2), the supported migration path is to install the GPFS base images for the new release, then apply any required service updates. GPFS will not work correctly if you use rpm -U command to upgrade directly to a service level of a new major release without installing the base images first. If this should happen you must uninstall and then reinstall the gpfs.base package. 

               Note: Upgrading to the GPFS 3.2.1.0 level from a pre-3.2 level of GPFS does not work correctly, and the same workaround is required.
  • GPFS V3.1 maintenance levels 10 (GPFS-3.1.0.10) thru 12 (GPFS-3.1.0.12) do not coexist with other maintenance levels 

             All nodes in the cluster must conform to one of these maintenance level compatibility restrictions:
    • All nodes must be at maintenance levels 1-9 or 13 and later (GPFS-3.1.0.1 thru GPFS-3.1.0.9 or GPFS-3.1.0.13 and later)
    • All nodes must be at maintenance levels 10-12 (GPFS-3.1.0.10 - GPFS-3.1.0.12)
  • Required service for support of SLES 10 includes:
  1. If running GPFS V3.1, service update 3.1.0-8 available at
    https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/download/home.html
  2. The GPFS required level of Korn shell for SLES 10 support is version ksh-93r-12.16 and can be obtained using one of these architecture-specific links: 

    x86 at
    https://you.novell.com/update/i386/update/SUSE-SLES/10/PTF/43ed798d45b1ce66790327fe89fb3ca6/20061201 

    POWER at
    https://you.novell.com/update/ppc/update/SUSE-SLES/10/PTF/43ed798d45b1ce66790327fe89fb3ca6/20061201 

    x86_64 at
    https://you.novell.com/update/x86_64/update/SUSE-SLES/10/PTF/43ed798d45b1ce66790327fe89fb3ca6/20061201 

  3. For SLES 10 on POWER:
  4. The gpfs.base 3.1.0-0 rpm must be installed using the rpm --nopre flag BEFORE any updates can be applied.
  5. /etc/init.d/running-kernel shipped prior to the availability of the SLES 10 SP1 kernel source rpm contains a bug that results in the wrong set of files being copied to the kernel source tree. Until SP1 is generally available, the following change should also address the problem: 

    --- running-kernel.orig 2006-10-06 14:54:36.000000000 -0500 
    +++ /etc/init.d/running-kernel 2006-10-06 14:59:58.000000000 -0500 
    @@ -53,6 +53,7 @@ 
    arm*|sa110) arch=arm ;; 
    s390x) arch=s390 ;; 

    parisc64) arch=parisc ;; 
    + ppc64) arch=powerpc ;; 
    esac 
    # FIXME: How to handle uml? 

  • When running GPFS on either a p5-590 or a p5-595:
    • The minimum GFW (system firmware) level required is SF222_081 (GA3 SP2), or later. 

               For the latest firmware versions, see the IBM Technical Support at http://www14.software.ibm.com/webapp/set2/firmware/gjsn
    • The supported Linux distribution is SUSE Linux ES 9.
    • Scaling is limited to 16 total processors.
  • IBM testing has revealed that some customers using the Gigabit Ethernet PCI-X adapters with the jumbo frames option enabled may be exposed to a potential data error. While receiving packet data, the Gigabit Ethernet PCI-X adapter may generate an erroneous DMA address when crossing a 64 KB boundary, causing a portion of the current packet and the previously received packet to be corrupted. 

             These Gigabit Ethernet PCI-X adapters and integrated Gigabit Ethernet PCI-X controllers could potentially experience this issue:
    • Type 5700, Gigabit Ethernet-SX PCI-X adapter (Feature Code 5700)
    • Type 5701, 10/100/1000 Base-TX Ethernet PCI-X Adapter (Feature code 5701)
    • Type 5706, Dual Port 10/100/1000 Base-TX Ethernet PCI-X Adapter (Feature code 5706)
    • Type 5707, Dual Port Gigabit Ethernet-SX PCI-X Adapter (Feature code 5707)
    • Integrated 10/100/1000 Base-TX Ethernet PCI-X controller on machine type 7029-6C3 and 6E3 (p615)
    • Integrated Dual Port 10/100/1000 Base-TX Ethernet PCI-X controller on machine type 9111-520 (p520)
    • Integrated Dual Port 10/100/1000 Base-TX Ethernet PCI-X controller on machine type 9113-550 (p550)
    • Integrated Dual Port 10/100/1000 Base-TX Ethernet PCI-X controller on machine type 9117-570 (p570) 

           This problem is fixed with:
    • For AIX 5L 5.2, APAR IY64531
    • For AIX 5L 5.3, APAR IY64393
  • IBM testing has revealed that some customers with the General Parallel File System who install AIX 5L Version 5.2 with the 5200-04 Recommended Maintenance package (bos.mp64 at the 5.2.0.40 or 5.2.0.41 levels) and execute programs which reside in GPFS storage may experience a system wide hang due to a change in the AIX 5L loader. This hang is characterized by an inability to login to the system and an inability to complete some GPFS operations on other nodes. This problem is fixed with the AIX 5L APAR IY60609. It is suggested that all customers installing the bos.mp64 fileset at the 5.2.0.40 or 5.2.0.41 level, who run GPFS, immediately install this APAR.
  • Service bulletins for pSeries, p5, and OpenPower servers at http://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd 

  1. Sign in with your IBM ID.
  2. Under the Bulletins tab:
    • For the Select a heading option, choose Cluster on POWER.
    • For the Select a topic option, choose General Parallel File System.
    • For the Select a month option, select a particular month or choose All months.


Q7.3: How do I download fixes for GPFS?
A7.3:
To download fixes for GPFS, go to
https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html


Q7.4: What are the current GPFS advisories?
A7.4:
The current GPFS advisories are:

  • Currently with GPFS Multiplatform for Linux V3.2.1-4 and lower, with Infiniband RDMA enabled, an issue exists which under certain conditions may cause data corruption. This is fixed in GPFS 3.2.1-6. Please apply 3.2.1-6 or turn RDMA off.
  • GPFS 2.3.0.x not compatible with AIX 5.3 TL6 

             Currently GPFS 2.3.0.x on AIX TL6 has a known private heap memory leak. 

             USER'S AFFECTED: All customers using GPFS 2.3 and AIX 5.3 

             DESCRIPTION: GPFS 2.3.0.0 through 2.3.0.23 do not work with AIX 5.3 TL6 due to the changes that AIX made in the threading library. GPFS 2.3 PTF 24 and up do have the necessary code changes to work with TL6 but they produce a private heap memory leak due to AIX APAR IZ04791. The AIX fix for this problem is scheduled for AIX TL6 SP4. A workaround that can be used until obtaining AIX TL6 SP4 is to change the GPFS configuration to not use the sigwait library call (mmchconfig asyncSocketNotify=no). Therefore, until the issue is resolved please be advised not to use GPFS 2.3.0.0 through 2.3.0.23 and AIX 5.3 TL6 in a production environment. AIX 5.3 TL1 through 5 are known to work with all GPFS 2.3 PTFs. 

             EFIX AVAILABLE: There are no fixes at this time. Once one is available notice will be given. Please seehttps://www14.software.ibm.com/webapp/set2/sas/f/gpfs/download/aix.html
  • In certain GPFS 2.3 and 3.1 PTF levels there is a subtle GPFS issue in truncate, where if multiple nodes are accessing the same file against which a truncate is issued on one of the nodes, a time window existed during which incorrect size information could be communicated to some nodes, which may cause GPFS to mishandle the last fragment of the file. This could lead to various failed internal consistency checks, manifested by the GPFS daemon shutting down abnormally. 

             The affected GPFS PTF levels are:
    • GPFS 3.1.0-6
    • GPFS 3.1.0-5
    • GPFS 2.3.0-17
    • GPFS 2.3.0-16
    • GPFS 2.3.0-15 

          Recommended action:
    • For customers running GPFS 3.1.0.x PTF 7 contains a fix and is available at www14.software.ibm.com/webapp/set2/sas/f/gpfs/download/home.html
    • For all other versions, please contact support.
  • Customers running IBM Virtual Shared Disk V4.1 using a communications adapter other than the IBM eServer pSeries High Performance Switch, who have configured IBM Virtual Shared Disk with an IP packet size greater then the Max Transfer Unit (MTU) of the network, may experience packet corruption. 

             IP must fragment packets that are greater than the MTU size of the network. On faster interconnects such as Gigabit Ethernet, the IP fragmentation buffer can be overrun and end up incorrectly assembling the fragments. This is an inherent limitation of the IP protocol, which can occur when the number of packets transferred exceeds the counter size, which then rolls over, potentially resulting in a duplicate packet number. 

             If a duplicate packet number occurs, and the checksum matches that of the expected packet, corruption of the IBM Virtual Shared Disk packets can result in GPFS file system corruption. IBM Virtual Shared Disk will attempt to validate the incoming packets and discard misformed packets, but it can not identify them every time (since checksums for different data patterns may be the same). 

             The level of IBM Virtual Shared Disk affected (shipped in AIX 5.2.x and later releases) has been available since October 2003, and the problem has only been confirmed as having occurred in an internal IBM test environment. 

             IP fragmentation can be prevented by configuring the IBM Virtual Shared Disk IP packet size less than or equal to the MTU size of the network. This will move the fragmentation into the IBM Virtual Shared Disk layer, which can correctly process the fragmentation. 

             The current IBM Virtual Shared Disk infrastructure allows for 160 packets per request which will limit the maximum buddy buffer size that can be used. For example:
    o for an MTU of 1500, you need to set the IBM Virtual Shared Disk IP packet size to 1024 effectively limiting the maximum buddy buffer size to 128 KB.
    o for an MTU of 9000, you need to set the IBM Virtual Shared Disk IP packet size to 8192 effectively limiting the maximum buddy buffer size to 1 MB.

             You can check the IBM Virtual Shared Disk IP packet size with these two commands: 

             vsdatalst -n
                 Shows you the value that will take affect at the next reboot. 
             statvsd
                 Show you the current value that the IBM Virtual Shared Disk device driver is using. 

             Here is an example of how to set the IP packet size when using jumbo Ethernet frames (MTU = 9000): 

             updatevsdnode -n ALL -M 8192
             dsh -a ctlvsd -M 8192 

             For more information see the RSCT for AIX 5L Managing Shared Disks manual at http://publib.boulder.ibm.com/ infocenter/clresctr/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsctbooks.html and search on the commands vsdnode, updatevsdnode, and ctlvsd. 

             APAR IY66940 will completely prevent IP fragmentation and will enforce the IBM Virtual Shared Disk IP packet size being less than the MTU size. This will also remove the restrictions relating to the maximum IBM Virtual Shared Disk buddy buffer size. 

             Anyone who cannot take the preventive action, for whatever reason, or is unsure whether their environment may be affected, should contact IBM service to discuss their situation:
    • In the United States contact us toll free at 1-800-IBM-SERV (1-800-426-7378)
    • In other countries, contact your local IBM Service Center


Q7.5: What Linux kernel patches are provided for clustered file systems such as GPFS?
A7.5:
The Linux kernel patches provided for clustered file systems are expected to correct problems that may be encountered when using GPFS with the Linux operating system. The supplied patches are currently being submitted to the Linux development community but may not be available in particular kernels. It is therefore suggested that they be appropriately applied based on your kernel version and distribution.

A listing of the latest patches, along with a more complete description of these patches, can be found at the General Parallel File System project on SourceForge (R) .net at http://sourceforge.net/tracker/?atid=719124&group_id=130828&func=browse:

  1. Click on the Summary description for the desired patch.
  2. Scroll down to the Summary section on the patch page for a description of and the status of the patch.
  3. To download a patch:
    1. Scroll down to the Attached Files section.
    2. Click on the Download link for your distribution and kernel level.

site.mcr consideration:
Patches listing a site.mcr define have additional steps to perform:

  1. Apply the patch to the Linux kernel, recompile, and install this kernel.
  2. In site.mcr either #define the option or uncomment the option if already present. Consult /usr/lpp/mmfs/src/README for more information.
  3. Recompile and reinstall the GPFS portability layer.


Q7.6: What Windows hotfix updates are required for GPFS?
A7.6:
The current Windows hotfix updates required for GPFS consist of :


Q7.7: Where can I find licensing and ordering information for GPFS?
A7.7:
The Cluster Software Ordering Guide provides the following information:

  • Licensing information

Licenses can also be viewed at http://www.ibm.com/software/sla/sladb.nsf

  • Ordering information
  • Software Maintenance Agreement information
  • Product End of Market/Service dates

Software support lifecycle information can also be viewed at http://www-306.ibm.com/software/support/lifecycle/index_a_z.html

  • Hardware and Software requirements

To view the Guide please go to http://www.ibm.com/systems/clusters/software/reports/order_guide.html

Back to the top of the page

Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any of IBM's intellectual property rights may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10594-1785
USA

For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to:

IBM World Trade Asia Corporation
Licensing
2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law:

INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Licensees of this program who wish to have information about it for the purpose of enabling: i) the exchange of information between independently created programs and other programs (including this one) and ii) the mutual use of the information which has been exchanged, should contact:

IBM Corporation
Intellectual Property Law
2455 South Road,P386
Poughkeepsie, NY 12601-5400
USA

Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee.

The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us.

Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

If you are viewing this information softcopy, the photographs and color illustrations may not appear.

Trademarks

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( (R) or (TM)), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information athttp://www.ibm.com/legal/copytrade.shtml

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Red Hat, the Red Hat "Shadow Man" logo, and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc., in the United States and other countries.

UNIX is a registered trademark of the Open Group in the United States and other countries.

Microsoft, Windows, Windows NT, and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries, or both.

Other company, product, and service names may be the trademarks or service marks of others.

February 2009
Copyright International Business Machines Corporation 2004,2009. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
1. 
GPFS for Linux on Itanium Servers is available only through a special Programming Request for Price Quotation (PRPQ). The install image is not generally available code. It must be requested by an IBM client representative through the RPQ system and approved before order fulfillment. If interested in obtaining this PRPQ, reference PRPQ # P91232 or Product ID 5799-GPS.
2. 
GPFS Sequential Input/Output Performance on IBM pSeries 690, Gautam Shah, James Wang available athttp://www.redbooks.ibm.com/redpapers/pdfs/redp3945.pdf

Back to the top of the page


블로그 이미지

Melting

,

+. ssh configuration

  >> cluster생성시 ssh와 scp type을 사용할 경우, 아래와 같이 ssh key를 먼저 등록해야 함

(lpar11g) # ssh lpar21g   >> exit (key만 생성 -> /.ssh 디렉토리가 생성됨)

(lpar11g) # ssh-keygen -t rsa >> 엔터만 치고 실행완료

(lpar11g) # cat /.ssh/id_rsa.pub >> /.ssh/authorized_keys  (자신의 public key도 넣어주어야함)

(lpar11g) # ' cat /.ssh/id_rsa.pub ' 명령으로 생성된 내용을 (lpar12g) 서버의 /.ssh/authorized_keys 파일에 추가

 >>> lpar21g에서도 동일한 작업 반복


+. rsh configuration


+. basic configuration scripts 

-------------------------------------------------------------

### fs re-size

chfs -a size=+512M /

chfs -a size=+1G /usr

chfs -a size=+1G /var

chfs -a size=+512M /tmp

chfs -a size=+1G /home

chfs -a size=+512M /opt



### IO state

chdev -l sys0 -a iostat=true


### Disk Attr

ins=1

while [ ${ins} -le 42 ]

do

        chdev -l hdisk${ins} -a pv=yes

        chdev -l hdisk${ins} -a reserve_policy=no_reserve

        ((ins=ins+1))

done


### Time sync

setclock lpar11 ; date ; rsh lpar11 date


### hushlogin (turn off the login msg)

touch /.hushlogin


-------------------------------------------------------------


+. .profile

-------------------------------------------------------------

lpar11:/# cat .profile

export GPFS_HOME=/usr/lpp/mmfs

export PS1=`hostname -s`':$PWD# '

export PATH=/usr/local/bin:${GPFS_HOME}/bin:${PATH}


set -o vi


banner `hostname`

-------------------------------------------------------------


+. /etc/hosts

-------------------------------------------------------------

#-- team 1 (multi-cluster #1)

10.10.10.151           lpar11

10.10.10.161           lpar21

10.10.11.151           lpar11g

10.10.11.161           lpar21g


#-- team 2 (multi-cluster #2)

10.10.10.152           lpar12

10.10.10.162           lpar22

10.10.11.152           lpar12g

10.10.11.162           lpar22g

-------------------------------------------------------------



+. lslpp -l gpfs*

  >> 기본 fileset외에 patch를 같이 깔아줘야 기동이 됨  (fixcentral)

     ex. GPFS 3.5.0.0 > 기동않됨 -> GPFS 3.5.0.6 로 update 후 기동


+. cluster configuration file

# cat /home/gpfs/gpfs.allnodes 

lpar11g:quorum-manager

lpar21g:quorum-manager


# >> 구문 >> NodeName:NodeDesignations:AdminNodeName  (NodeDesignations와 AdminNodeName은 optional)

  >> NodeDesignations 항목은 'manager|client'-'quorum|nonquorum' 으로 지정

  >> manager|client - 'file system manager'의 pool에 넣을 건지의 여부 (default는 client)

  >> quorum|nonquorum - default는 noquorum 

# >> quorum의 max는 8개이며, 모든 quorum 은 tiebreak disk에 대해서 access가 가능해야 함

# >> lpar11_gpfs, lpar21_gpfs > should be register to /etc/hosts

  >> private/public network 모두 가능하나, 당연히 private network 권장(subnet을 나누기를 권장)

# >> 일반적인 RAC구성에서는 모든 노드를 guorum-manager로 구성



+. gpfs cluster creation - 사전 정의된 nodelist 파일을 이용 #1

# mmcrcluster -n /home/gpfs/gpfs.allnodes -p lpar11g -s lpar21g -C gpfs_cluster -r /usr/bin/ssh -R /usr/bin/scp

#  >> ssh 와 scp type으로 설치시 private network의 hostname으로 지정되어야 password관련 문제가 없음


# >> -n : node description file

  >> -p : primary node

  >> -s : secondary node

  >> -C : cluster name

  >> -r : remote shell (default 는 rsh)

  >> -R : remote copy (default 는 rcp)

  

+. mmlscluster  

  

+. license agreement (gpfs3.3+)

# mmchlicense server --accept -N lpar11g,lpar21g


+. gpfs 기동 및 status check

# mmstartup -a

# mmgetstate -a

 Node number  Node name        GPFS state

------------------------------------------

       1      lpar11g          active

       2      lpar21g          active


# mmlscluster


GPFS cluster information

========================

  GPFS cluster name:         gpfs_cluster.lpar11g

  GPFS cluster id:           1399984813853589142

  GPFS UID domain:           gpfs_cluster.lpar11g

  Remote shell command:      /usr/bin/rsh

  Remote file copy command:  /usr/bin/rcp


GPFS cluster configuration servers:

-----------------------------------

  Primary server:    lpar11g

  Secondary server:  lpar21g


 Node  Daemon node name  IP address        Admin node name              Designation

------------------------------------------------------------------------------------

   1   lpar11g           170.24.46.151     lpar11g                      quorum-manager

   2   lpar21g           170.24.46.161     lpar21g                      quorum-manager


+. gpfs cluster creation - 사전 정의된 nodelist 파일을 이용 (man mmaddnode) #2

# mmcrcluster -N lpar11g:manager-quorum -p lpar11g -r /usr/bin/ssh -r /usr/bin/scp

# mmaddnode -N lpar21g

# mmchcluster -s lpar21g

# mmchnode -N lpar21g --client --nonquorum

# mmchnode -N lpar21g --manager --quorum

# mmlscluster

  

  >> 삭제는 mmdelnode -N lpar21g 

  >> Primary Node와 Secondary Node는 삭제가 불가능

  

+. cluster 내 node의 기동 및 종료

# mmstartup -a / mmshutdown -a

# mmstartup -N lpar21g / mmshutdown -N lpar21g

# mmgetstate -a / mmgetstate -N lpar21g

# mmgetstate -a


 Node number  Node name        GPFS state

------------------------------------------

       1      lpar11g          active

       2      lpar21g          active

  

+. gpfs cluster관련 log

# tail -f /var/adm/ras/mmfs.log.latest



+. NSD Configuration

# cat /home/gpfs/gpfs.clusterDisk 

hdisk1:::dataAndMetadata::nsd1:

hdisk2:::dataAndMetadata::nsd2:

hdisk3:::dataAndMetadata::nsd3:

hdisk4:::dataAndMetadata::nsd4:

hdisk5:::dataAndMetadata::nsd5:

hdisk6:::dataAndMetadata::nsd6:

hdisk7:::dataAndMetadata::nsd7:


  >> [NSD로 사용할 Disk]:[Primary Server]:[Backup Server]:[Disk Usage]:[Failure Group]:[Desired NSD Name]:[Storage Pool]

  >> [NSD로 사용할 Disk] - '/dev/hdisk3' 형태로도 지정 가능

     [Primary Server] && [Backup Server] 

        - cluster내에서 I/O를 수행하는 Primary && Backup Server

- cluster의 node들이 SAN으로 연결되어 있고, 모두 같은 disk를 공유할 경우 >> 이 두항목을 Blank로 비워둬야 함

        -  case 1) (lpar11g 와 lpar21g 가 san으로 연결되고 GPFS server로 작동) && (lpar12g 는 san 연결없이 client 로 작동)

          -> lpar12g에서 nsd의 위치를 알 수 없기 때문에.... 

     hdisk1:lpar11g:lpar21g:dataAndMetadata::nsd1:

 와 같이 정의하고, lpar12g는 node추가시 client로 등록

  case 2) lpar11g 와 lpar21g 가 san으로 연결되고 GPFS server && client 로 작동

          -> 모든 server와 client에서 nsd에 직접 접근이 가능하므로...

                      hdisk1:::dataAndMetadata::nsd1:

             와 같이 정의해도 됨.

     [Disk Usage] 

        - 'dataOnly|metadataOnly|dataAndMetadata|descOnly'

        - system pool의 경우 dataAndMetadata 가 default && storage pool은 dataOnly 가 default

     [Desired NSD Name] - cluster내에서 unique 해야하며, 미지정시 'gpfs1nsd' 와 같은 형식으로 생성됨

 

 

# mmcrnsd -F /home/gpfs/gpfs.clusterDisk

# mmlsnsd

 File system   Disk name    NSD servers

---------------------------------------------------------------------------

 (free disk)   nsd1         (directly attached)

 (free disk)   nsd2         (directly attached)

 (free disk)   nsd3         (directly attached)

 (free disk)   nsd4         (directly attached)

 (free disk)   nsd5         (directly attached)

 (free disk)   nsd6         (directly attached)

 (free disk)   nsd7         (directly attached)

 

# mmdelnsd nsd7 

  >> 삭제 후 개별 NSD 추가는 gpfs.clusterDisk2 파일을 추가로 생성 후, mmcrnsd -F gpfs.clusterDisk2 로 수행

*. 기존에 gpfs로 한번 잡힌 disk는 lspv에서 'gpfs'표시되며, 이 경우 다시 gpfs용으로 잡을 수 없음 > 정보를 깨고 다시 구성해야함
lpar11 && lpar12 >>  
  dd if=/dev/zero of=/dev/rhdiskXX bs=1024 count=100
  rmdev -dl hdiskXX
  cfgmgr -v
  chdev -l hdiskXX -a reserve_policy=no_reserve
  chdev -l hdiskXX -a pv=yes

  


+. tiebreak disk 설정

# mmshutdown -a

# mmchconfig tiebreakerDisks=nsd7

  >> Tiebreaker disk를 1개만 설정할 경우...

# mmchconfig tiebreakerDisks=no

# mmchconfig tiebreakerDisks='nsd5;nsd6;nsd7'

  >> Tiebreaker disk는 3개이상을 권장하며, 3-node이상에서는 tiebreaker disk가 불필요

# mmlsconfig | grep tiebreakerDisks

tiebreakerDisks nsd5;nsd6;nsd7



+. gpfs file system 생성

# cp /home/gpfs/gpfs.clusterDisk /home/gpfs/gpfs.clusterDisk.fs

  >> /home/gpfs/gpfs.clusterDisk.fs 에서 nsd1, nsd2, nsd3, nsd4 외의 항목삭제

# mmcrfs /gpfs fs1 -F /home/gpfs/gpfs.clusterDisk.fs -A yes -B 512k -n 16

  >> '/gpfs' : mount point

  >> 'fs1' : device name (filesystem name) > '/dev/fs1' 처럼 주기도 함

  >> '-F' /home/gpfs/gpfs.clusterDisk.fs : filesystem으로 등록할 NSD 정의 (mmcrnsd하면 자동으로 생성됨)

  >> '-A yes' : mmstartup시 automount 여부

  >> '-B 512k' : block size로 16k~1MB까지 설정가능. Oracle은 일반적으로 256k(512k)를 권장하나, 

                 file size가 작은 그룹웨어나 이메일 시스템은 block size를 작게 설정해야함

  >> '-n 16' : 파일시스템을 사용할 노드의 개수, 한번 설정하면 수정이 불가능하므로 여유를 둬서 크게 설정할 것

# mmmount all -a

# mmlsfs fs1

flag                value                    description

------------------- ------------------------ -----------------------------------

 -f                 8192                     Minimum fragment size in bytes

 -i                 512                      Inode size in bytes

 -I                 16384                    Indirect block size in bytes

 -m                 1                        Default number of metadata replicas

 -M                 2                        Maximum number of metadata replicas

 -r                 1                        Default number of data replicas

 -R                 2                        Maximum number of data replicas

 -j                 cluster                  Block allocation type

 -D                 nfs4                     File locking semantics in effect

 -k                 all                      ACL semantics in effect

 -n                 32                       Estimated number of nodes that will mount file system

 -B                 262144                   Block size

 -Q                 none                     Quotas enforced

                    none                     Default quotas enabled

 --filesetdf        no                       Fileset df enabled?

 -V                 13.01 (3.5.0.0)          File system version

 --create-time      Mon Nov 26 14:08:51 2012 File system creation time

 -u                 yes                      Support for large LUNs?

 -z                 no                       Is DMAPI enabled?

 -L                 4194304                  Logfile size

 -E                 yes                      Exact mtime mount option

 -S                 no                       Suppress atime mount option

 -K                 whenpossible             Strict replica allocation option

 --fastea           yes                      Fast external attributes enabled?

 --inode-limit      67584                    Maximum number of inodes

 -P                 system                   Disk storage pools in file system

 -d                 nsd1;nsd2;nsd3;nsd4      Disks in file system

 --perfileset-quota no                       Per-fileset quota enforcement

 -A                 yes                      Automatic mount option

 -o                 none                     Additional mount options

 -T                 /gpfs                    Default mount point

 --mount-priority   0                        Mount priority

# mmdf fs1

disk                disk size  failure holds    holds              free KB             free KB

name                    in KB    group metadata data        in full blocks        in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 98 GB)

nsd1                 10485760       -1 yes      yes        10440704 (100%)           488 ( 0%)

nsd2                 10485760       -1 yes      yes        10440448 (100%)           248 ( 00%)

nsd3                 10485760       -1 yes      yes        10440960 (100%)           248 ( 00%)

nsd4                 10485760       -1 yes      yes        10440192 (100%)           472 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         41943040                              41762304 (100%)          1456 ( 00%)


                =============                         ==================== ===================

(total)              41943040                              41762304 (100%)          1456 ( 00%)


Inode Information

-----------------

Number of used inodes:            4038

Number of free inodes:           63546

Number of allocated inodes:      67584

Maximum number of inodes:        67584



+. gpfs filesystem nsd disk 관리 

# mmlsfs all

File system attributes for /dev/fs1:

====================================

flag                value                    description

------------------- ------------------------ -----------------------------------

 -f                 8192                     Minimum fragment size in bytes

 -i                 512                      Inode size in bytes

 -I                 16384                    Indirect block size in bytes

 -m                 1                        Default number of metadata replicas

 -M                 2                        Maximum number of metadata replicas

 -r                 1                        Default number of data replicas

 -R                 2                        Maximum number of data replicas

 -j                 cluster                  Block allocation type

 -D                 nfs4                     File locking semantics in effect

 -k                 all                      ACL semantics in effect

 -n                 16                       Estimated number of nodes that will mount file system

 -B                 262144                   Block size

 -Q                 none                     Quotas enforced

                    none                     Default quotas enabled

 --filesetdf        no                       Fileset df enabled?

 -V                 13.01 (3.5.0.0)          File system version

 --create-time      Mon Nov 26 14:08:51 2012 File system creation time

 -u                 yes                      Support for large LUNs?

 -z                 no                       Is DMAPI enabled?

 -L                 4194304                  Logfile size

 -E                 yes                      Exact mtime mount option

 -S                 no                       Suppress atime mount option

 -K                 whenpossible             Strict replica allocation option

 --fastea           yes                      Fast external attributes enabled?

 --inode-limit      67584                    Maximum number of inodes

 -P                 system                   Disk storage pools in file system

 -d                 nsd1;nsd2;nsd3;nsd4      Disks in file system

 --perfileset-quota no                       Per-fileset quota enforcement

 -A                 yes                      Automatic mount option

 -o                 none                     Additional mount options

 -T                 /gpfs                    Default mount point

 --mount-priority   0                        Mount priority


#  mmlsdisk fs1

disk         driver   sector failure holds    holds                            storage

name         type       size   group metadata data  status        availability pool

------------ -------- ------ ------- -------- ----- ------------- ------------ ------------

nsd1         nsd         512      -1 yes      yes   ready         up           system

nsd2         nsd         512      -1 yes      yes   ready         up           system

nsd3         nsd         512      -1 yes      yes   ready         up           system

nsd4         nsd         512      -1 yes      yes   ready         up           system


# mmdeldisk fs1 nsd4

  >> 'fs1' filesystem에서 'nsd4' disk 제거

Deleting disks ...

GPFS: 6027-589 Scanning file system metadata, phase 1 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 2 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 3 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 4 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-565 Scanning user file metadata ...

 100.00 % complete on Mon Nov 26 17:05:54 2012

GPFS: 6027-552 Scan completed successfully.

Checking Allocation Map for storage pool 'system'

GPFS: 6027-370 tsdeldisk64 completed.

mmdeldisk: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.


# mmunmount /gpfs -N lpar21g

# mmunmount /gpfs -a

# mmdelfs fs1

  >> 'fs1' filesystem 자체를 삭제

GPFS: 6027-573 All data on following disks of fs1 will be destroyed:

    nsd1

    nsd2

    nsd3

GPFS: 6027-574 Completed deletion of file system /dev/fs1.

mmdelfs: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.


  

+. GPFS 운영 및 관리 (Administration)

# mmlsmgr fs1

# mmchmgr fs1 lpar21g

GPFS: 6027-628 Sending migrate request to current manager node 170.24.46.151 (lpar11g).

GPFS: 6027-629 Node 170.24.46.151 (lpar11g) resigned as manager for fs1.

GPFS: 6027-630 Node 170.24.46.161 (lpar21g) appointed as manager for fs1.

# mmlsmgr fs1

file system      manager node       [from 170.24.46.161 (lpar21g)]

---------------- ------------------

fs1              170.24.46.161 (lpar21g)

# mmchconfig autoload=yes

  >> system 기동시 자동으로 gpfs daemon 기동

mmchconfig: Command successfully completed

mmchconfig: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.

# mmlsconfig | grep autoload

autoload yes


# mmfsadm dump config

  >> GPFS 파라미터 모두 조회

# mmfsadm dump config | grep pagepool

   pagepool 536870912

   pagepoolMaxPhysMemPct 75

   pagepoolPageSize 65536

   pagepoolPretranslate 0


----------------------------------------------------

+. Storage Pools, Filesets and Policies

----------------------------------------------------

+. clean-up test env

# mmumount all -a ; mmdelfs fs1 ; mmdelnsd "nsd1;nsd2;nsd3;nsd4"

  >> 'nsd5;nsd6;nsd7' 의 tiebreaker disk는 그대로 유지


+. create nsd

#  cat /home/gpfs/gpfs.clusterDisk.storagePool

hdisk1:::dataAndMetadata::nsd1:system

hdisk2:::dataAndMetadata::nsd2:system

hdisk3:::dataOnly::nsd3:pool1

hdisk4:::dataOnly::nsd4:pool1


# mmcrnsd -F /home/gpfs/gpfs.clusterDisk.storagePool

mmcrnsd: Processing disk hdisk1

mmcrnsd: Processing disk hdisk2

mmcrnsd: Processing disk hdisk3

mmcrnsd: Processing disk hdisk4

mmcrnsd: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.


# mmcrfs /gpfs fs1 -F /home/gpfs/gpfs.clusterDisk.storagePool -A yes -B 512k -n 16

GPFS: 6027-531 The following disks of fs1 will be formatted on node lpar11:

    nsd1: size 10485760 KB

    nsd2: size 10485760 KB

    nsd3: size 10485760 KB

    nsd4: size 10485760 KB

GPFS: 6027-540 Formatting file system ...

GPFS: 6027-535 Disks up to size 103 GB can be added to storage pool 'system'.

GPFS: 6027-535 Disks up to size 103 GB can be added to storage pool 'pool1'.

Creating Inode File

Creating Allocation Maps

Creating Log Files

Clearing Inode Allocation Map

Clearing Block Allocation Map

Formatting Allocation Map for storage pool 'system'

Formatting Allocation Map for storage pool 'pool1'

GPFS: 6027-572 Completed creation of file system /dev/fs1.

mmcrfs: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.


# mmlsfs fs1

# mmmount /gpfs -a


# mmdf fs1

disk                disk size  failure holds    holds              free KB             free KB

name                    in KB    group metadata data        in full blocks        in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1                 10485760       -1 yes      yes        10427904 ( 99%)           976 ( 0%)

nsd2                 10485760       -1 yes      yes        10428416 ( 99%)           992 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20856320 ( 99%)          1968 ( 0%)


Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3                 10485760       -1 no       yes        10483200 (100%)           496 ( 0%)

nsd4                 10485760       -1 no       yes        10483200 (100%)           496 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20966400 (100%)           992 ( 0%)


                =============                         ==================== ===================

(data)               41943040                              41822720 (100%)          2960 ( 0%)

(metadata)           20971520                              20856320 ( 99%)          1968 ( 0%)

                =============                         ==================== ===================

(total)              41943040                              41822720 (100%)          2960 ( 0%)


Inode Information

-----------------

Number of used inodes:            4022

Number of free inodes:           63562

Number of allocated inodes:      67584

Maximum number of inodes:        67584


+. create fileset

#  mmcrfileset fs1 fileset1

Snapshot 'fileset1' created with id 1.

#  mmcrfileset fs1 fileset2

Snapshot 'fileset2' created with id 2.

#  mmcrfileset fs1 fileset3

Snapshot 'fileset3' created with id 3.

#  mmcrfileset fs1 fileset4

Snapshot 'fileset4' created with id 4.

#  mmcrfileset fs1 fileset5

Snapshot 'fileset5' created with id 5.

# mmlsfileset fs1

Filesets in file system 'fs1':

Name                     Status    Path

root                     Linked    /gpfs

fileset1                 Unlinked  --

fileset2                 Unlinked  --

fileset3                 Unlinked  --

fileset4                 Unlinked  --

fileset5                 Unlinked  --


#  mmlinkfileset fs1 fileset1 -J /gpfs/fileset1

Fileset 'fileset1' linked at '/gpfs/fileset1'.

#  mmlinkfileset fs1 fileset2 -J /gpfs/fileset2

Fileset 'fileset2' linked at '/gpfs/fileset2'.

#  mmlinkfileset fs1 fileset3 -J /gpfs/fileset3

Fileset 'fileset3' linked at '/gpfs/fileset3'.

#  mmlinkfileset fs1 fileset4 -J /gpfs/fileset4

Fileset 'fileset4' linked at '/gpfs/fileset4'.

#  mmlinkfileset fs1 fileset5 -J /gpfs/fileset5

Fileset 'fileset5' linked at '/gpfs/fileset5'.

# mmlsfileset fs1

Filesets in file system 'fs1':

Name                     Status    Path

root                     Linked    /gpfs

fileset1                 Linked    /gpfs/fileset1

fileset2                 Linked    /gpfs/fileset2

fileset3                 Linked    /gpfs/fileset3

fileset4                 Linked    /gpfs/fileset4

fileset5                 Linked    /gpfs/fileset5



+. file placement policy

# cat /home/gpfs/placementpolicy.txt

/* The fileset does not matter, we want all .dat and .DAT files to go to pool1 */

RULE 'datfiles' SET POOL 'pool1' WHERE UPPER(name) like '%.DAT'

/* All non *.dat files placed in fileset5 will go to pool1 */

RULE 'fs5' SET POOL 'pool1' FOR FILESET ('fileset5')

/* Set a default rule that sends all files not meeting the other criteria to the system pool */

RULE 'default' set POOL 'system'


# mmchpolicy fs1 /home/gpfs/placementpolicy.txt

Validated policy `placementpolicy.txt': parsed 3 Placement Rules, 0 Restore Rules, 0 Migrate/Delete/Exclude Rules,

        0 List Rules, 0 External Pool/List Rules

GPFS: 6027-799 Policy `placementpolicy.txt' installed and broadcast to all nodes.


# mmlspolicy fs1 -L

/* The fileset does not matter, we want all .dat and .DAT files to go to pool1 */

RULE 'datfiles' SET POOL 'pool1' WHERE UPPER(name) like '%.DAT'

/* All non *.dat files placed in fileset5 will go to pool1 */

RULE 'fs5' SET POOL 'pool1' FOR FILESET ('fileset5')

/* Set a default rule that sends all files not meeting the other criteria to the system pool */

RULE 'default' set POOL 'system'



+. placement policy test

   >> 'dd if=/dev/zero of=/gpfs/fileset1/bigfile1 bs=64k count=1000' 수행 전후 결과를 mmdf fs1 으로 비교

   >> system pool 에 file 이 입력됨 (default rule)

   

   >> 'dd if=/dev/zero of=/gpfs/fileset1/bigfile1.dat bs=64k count=1000'  수행 전후 결과를 mmdf fs1 으로 비교

   >> pool1 에 file 이 입력됨 (datfiles rule)

   

   >> 'dd if=/dev/zero of=/gpfs/fileset5/bigfile2 bs=64k count=1000'  수행 전후 결과를 mmdf fs1 으로 비교

   >> pool1 에 file 이 입력됨 (fs5 rule)

   

   >> 'mmlsattr -L /gpfs/fileset5/bigfile2' 처럼 mmlsattr 명령으로도 확인 가능

   

# mmdf fs1

disk                disk size  failure holds    holds              free KB             free KB

name                    in KB    group metadata data        in full blocks        in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1                 10485760       -1 yes      yes        10427904 ( 99%)           976 ( 0%)

nsd2                 10485760       -1 yes      yes        10427392 ( 99%)           992 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20855296 ( 99%)          1968 ( 0%)


Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3                 10485760       -1 no       yes        10483200 (100%)           496 ( 0%)

nsd4                 10485760       -1 no       yes        10483200 (100%)           496 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20966400 (100%)           992 ( 0%)


                =============                         ==================== ===================

(data)               41943040                              41821696 (100%)          2960 ( 0%)

(metadata)           20971520                              20855296 ( 99%)          1968 ( 0%)

                =============                         ==================== ===================

(total)              41943040                              41821696 (100%)          2960 ( 0%)


Inode Information

-----------------

Number of used inodes:            4027

Number of free inodes:           63557

Number of allocated inodes:      67584

Maximum number of inodes:        67584

# dd if=/dev/zero of=/gpfs/fileset1/bigfile1 bs=64k count=1000

1000+0 records in.

1000+0 records out.

# mmdf fs1

disk                disk size  failure holds    holds              free KB             free KB

name                    in KB    group metadata data        in full blocks        in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1                 10485760       -1 yes      yes        10395648 ( 99%)          1472 ( 0%)

nsd2                 10485760       -1 yes      yes        10395136 ( 99%)           992 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20790784 ( 99%)          2464 ( 0%)


Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3                 10485760       -1 no       yes        10483200 (100%)           496 ( 0%)

nsd4                 10485760       -1 no       yes        10483200 (100%)           496 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20966400 (100%)           992 ( 0%)


                =============                         ==================== ===================

(data)               41943040                              41757184 (100%)          3456 ( 0%)

(metadata)           20971520                              20790784 ( 99%)          2464 ( 0%)

                =============                         ==================== ===================

(total)              41943040                              41757184 (100%)          3456 ( 0%)


Inode Information

-----------------

Number of used inodes:            4028

Number of free inodes:           63556

Number of allocated inodes:      67584

Maximum number of inodes:        67584



# dd if=/dev/zero of=/gpfs/fileset1/bigfile1.dat bs=64k count=1000

1000+0 records in.

1000+0 records out.

# mmdf fs1

disk                disk size  failure holds    holds              free KB             free KB

name                    in KB    group metadata data        in full blocks        in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1                 10485760       -1 yes      yes        10395648 ( 99%)          1472 ( 0%)

nsd2                 10485760       -1 yes      yes        10395136 ( 99%)           976 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20790784 ( 99%)          2448 ( 0%)


Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3                 10485760       -1 no       yes        10451456 (100%)           496 ( 0%)

nsd4                 10485760       -1 no       yes        10450944 (100%)           496 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20902400 (100%)           992 ( 0%)


                =============                         ==================== ===================

(data)               41943040                              41693184 ( 99%)          3440 ( 0%)

(metadata)           20971520                              20790784 ( 99%)          2448 ( 0%)

                =============                         ==================== ===================

(total)              41943040                              41693184 ( 99%)          3440 ( 0%)


Inode Information

-----------------

Number of used inodes:            4029

Number of free inodes:           63555

Number of allocated inodes:      67584

Maximum number of inodes:        67584



# dd if=/dev/zero of=/gpfs/fileset5/bigfile2 bs=64k count=1000

1000+0 records in.

1000+0 records out.

lpar11:/home/gpfs# mmdf fs1

disk                disk size  failure holds    holds              free KB             free KB

name                    in KB    group metadata data        in full blocks        in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1                 10485760       -1 yes      yes        10395648 ( 99%)          1456 ( 0%)

nsd2                 10485760       -1 yes      yes        10395136 ( 99%)           976 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20790784 ( 99%)          2432 ( 0%)


Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3                 10485760       -1 no       yes        10419200 ( 99%)           496 ( 0%)

nsd4                 10485760       -1 no       yes        10419200 ( 99%)           496 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20838400 ( 99%)           992 ( 0%)


                =============                         ==================== ===================

(data)               41943040                              41629184 ( 99%)          3424 ( 0%)

(metadata)           20971520                              20790784 ( 99%)          2432 ( 0%)

                =============                         ==================== ===================

(total)              41943040                              41629184 ( 99%)          3424 ( 0%)


Inode Information

-----------------

Number of used inodes:            4030

Number of free inodes:           63554

Number of allocated inodes:      67584

Maximum number of inodes:        67584



# mmlsattr -L /gpfs/fileset1/bigfile1

file name:            /gpfs/fileset1/bigfile1

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    system

fileset name:         fileset1

snapshot name:

creation Time:        Tue Nov 27 11:28:29 2012

Windows attributes:   ARCHIVE


# mmlsattr -L /gpfs/fileset1/bigfile1.dat

file name:            /gpfs/fileset1/bigfile1.dat

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    pool1

fileset name:         fileset1

snapshot name:

creation Time:        Tue Nov 27 11:33:36 2012

Windows attributes:   ARCHIVE


# mmlsattr -L /gpfs/fileset5/bigfile2

file name:            /gpfs/fileset5/bigfile2

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    pool1

fileset name:         fileset5

snapshot name:

creation Time:        Tue Nov 27 11:35:55 2012

Windows attributes:   ARCHIVE


# dd if=/dev/zero of=/gpfs/fileset3/bigfile3 bs=64k count=1000

1000+0 records in.

1000+0 records out.

# dd if=/dev/zero of=/gpfs/fileset4/bigfile4 bs=64k count=1000

1000+0 records in.

1000+0 records out.



+. file management with policy

# cat /home/gpfs/managementpolicy.txt

RULE 'datfiles' DELETE WHERE UPPER(name) like '%.DAT'

RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE UPPER(name) like 'BIG%'


# mmapplypolicy fs1 -P /home/gpfs/managementpolicy.txt -I test

   >> 지정된 policy에 대해 Test 수행

[I] GPFS Current Data Pool Utilization in KB and %

pool1   133120  20971520        0.634766%

system  308736  20971520        1.472168%

[I] 4032 of 67584 inodes used: 5.965909%.

[I] Loaded policy rules from /home/gpfs/managementpolicy.txt.

Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2012-11-27@02:53:48 UTC

parsed 0 Placement Rules, 0 Restore Rules, 2 Migrate/Delete/Exclude Rules,

        0 List Rules, 0 External Pool/List Rules

RULE 'datfiles' DELETE WHERE UPPER(name) like '%.DAT'

RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE UPPER(name) like 'BIG%'

[I]2012-11-27@02:53:49.218 Directory entries scanned: 11.

[I] Directories scan: 5 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@02:53:49.231 Sorting 11 file list records.

[I] Inodes scan: 5 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@02:53:49.303 Policy evaluation. 11 files scanned.

[I]2012-11-27@02:53:49.315 Sorting 5 candidate file list records.

[I]2012-11-27@02:53:49.323 Choosing candidate files. 5 records scanned.

[I] Summary of Rule Applicability and File Choices:

 Rule#  Hit_Cnt KB_Hit  Chosen  KB_Chosen       KB_Ill  Rule

  0     1       64000   1       64000   0       RULE 'datfiles' DELETE WHERE(.)

  1     4       256000  3       192000  0       RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE(.)


[I] Filesystem objects with no applicable rules: 6.


[I] GPFS Policy Decisions and File Choice Totals:

 Chose to migrate 192000KB: 3 of 4 candidates;

 Chose to premigrate 0KB: 0 candidates;

 Already co-managed 0KB: 0 candidates;

 Chose to delete 64000KB: 1 of 1 candidates;

 Chose to list 0KB: 0 of 0 candidates;

 0KB of chosen data is illplaced or illreplicated;

Predicted Data Pool Utilization in KB and %:

pool1   261120  20971520        1.245117%

system  116736  20971520        0.556641%


# mmapplypolicy fs1 -P /home/gpfs/managementpolicy.txt 

[I] GPFS Current Data Pool Utilization in KB and %

pool1   133120  20971520        0.634766%

system  308736  20971520        1.472168%

[I] 4032 of 67584 inodes used: 5.965909%.

[I] Loaded policy rules from /home/gpfs/managementpolicy.txt.

Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2012-11-27@02:54:46 UTC

parsed 0 Placement Rules, 0 Restore Rules, 2 Migrate/Delete/Exclude Rules,

        0 List Rules, 0 External Pool/List Rules

RULE 'datfiles' DELETE WHERE UPPER(name) like '%.DAT'

RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE UPPER(name) like 'BIG%'

[I]2012-11-27@02:54:47.697 Directory entries scanned: 11.

[I] Directories scan: 5 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@02:54:47.708 Sorting 11 file list records.

[I] Inodes scan: 5 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@02:54:47.727 Policy evaluation. 11 files scanned.

[I]2012-11-27@02:54:47.759 Sorting 5 candidate file list records.

[I]2012-11-27@02:54:47.761 Choosing candidate files. 5 records scanned.

[I] Summary of Rule Applicability and File Choices:

 Rule#  Hit_Cnt KB_Hit  Chosen  KB_Chosen       KB_Ill  Rule

  0     1       64000   1       64000   0       RULE 'datfiles' DELETE WHERE(.)

  1     4       256000  3       192000  0       RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE(.)


[I] Filesystem objects with no applicable rules: 6.


[I] GPFS Policy Decisions and File Choice Totals:

 Chose to migrate 192000KB: 3 of 4 candidates;

 Chose to premigrate 0KB: 0 candidates;

 Already co-managed 0KB: 0 candidates;

 Chose to delete 64000KB: 1 of 1 candidates;

 Chose to list 0KB: 0 of 0 candidates;

 0KB of chosen data is illplaced or illreplicated;

Predicted Data Pool Utilization in KB and %:

pool1   261120  20971520        1.245117%

system  116736  20971520        0.556641%

[I]2012-11-27@02:54:50.399 Policy execution. 4 files dispatched.

[I] A total of 4 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;

        0 'skipped' files and/or errors.


+. External Pool Management

# cat /home/gpfs/expool1.ksh

#!/usr/bin/ksh

dt=`date +%h%d%y-%H_%M_%S`

results=/tmp/FileReport_${dt}


echo one $1

if [[ $1 == 'MIGRATE' ]];then

echo Filelist

echo There are `cat $2 | wc -l ` files that match >> ${result}

cat $2 >> ${results}

echo ----

echo - The file list report has been placed in ${results}

echo ----

fi


# cat /home/gpfs/listrule1.txt

RULE EXTERNAL POOL 'externalpoolA' EXEC '/home/gpfs/expool1.ksh'

RULE 'MigToExt' MIGRATE TO POOL 'externalpoolA' WHERE FILE_SIZE > 2


# mmapplypolicy fs1 -P /home/gpfs/listrule1.txt

[I] GPFS Current Data Pool Utilization in KB and %

pool1   261120  20971520        1.245117%

system  116736  20971520        0.556641%

[I] 4031 of 67584 inodes used: 5.964429%.

[I] Loaded policy rules from /home/gpfs/listrule1.txt.

Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2012-11-27@04:09:22 UTC

parsed 0 Placement Rules, 0 Restore Rules, 1 Migrate/Delete/Exclude Rules,

        0 List Rules, 1 External Pool/List Rules

RULE EXTERNAL POOL 'externalpoolA' EXEC '/home/gpfs/expool1.ksh'

RULE 'MigToExt' MIGRATE TO POOL 'externalpoolA' WHERE FILE_SIZE > 2

one TEST

[I]2012-11-27@04:09:23.436 Directory entries scanned: 10.

[I] Directories scan: 4 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@04:09:23.447 Sorting 10 file list records.

[I] Inodes scan: 4 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@04:09:23.474 Policy evaluation. 10 files scanned.

[I]2012-11-27@04:09:23.501 Sorting 4 candidate file list records.

[I]2012-11-27@04:09:23.503 Choosing candidate files. 4 records scanned.

[I] Summary of Rule Applicability and File Choices:

 Rule#  Hit_Cnt KB_Hit  Chosen  KB_Chosen       KB_Ill  Rule

  0     4       256000  4       256000  0       RULE 'MigToExt' MIGRATE TO POOL 'externalpoolA' WHERE(.)


[I] Filesystem objects with no applicable rules: 6.


[I] GPFS Policy Decisions and File Choice Totals:

 Chose to migrate 256000KB: 4 of 4 candidates;

 Chose to premigrate 0KB: 0 candidates;

 Already co-managed 0KB: 0 candidates;

 Chose to delete 0KB: 0 of 0 candidates;

 Chose to list 0KB: 0 of 0 candidates;

 0KB of chosen data is illplaced or illreplicated;

Predicted Data Pool Utilization in KB and %:

pool1   5120    20971520        0.024414%

system  116736  20971520        0.556641%

one MIGRATE27@04:09:23.505 Policy execution. 0 files dispatched.  \.......

Filelist

There are 4 files that match

----

- The file list report has been placed in /tmp/FileReport_Nov2712-04_09_23

----

[I]2012-11-27@04:09:23.531 Policy execution. 4 files dispatched.

[I] A total of 4 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;

        0 'skipped' files and/or errors.


# more /tmp/FileReport_Nov2712-04_09_23

47621 65538 0   -- /gpfs/fileset1/bigfile1

47623 65538 0   -- /gpfs/fileset5/bigfile2

47624 65538 0   -- /gpfs/fileset3/bigfile3

47625 65538 0   -- /gpfs/fileset4/bigfile4




----------------------------------------------------

+. Replication (file 단위/filesystem 단위)

----------------------------------------------------

# mmlsfs fs1 -mrMR

  >> Replication 정보를 확인. 만일 Replication 이 없을 경우는...

  >> mmcrfs /gpfs fs1 -F pooldesc.txt -B 64k

flag                value                    description

------------------- ------------------------ -----------------------------------

 -m                 1                        Default number of metadata replicas

 -r                 1                        Default number of data replicas

 -M                 2                        Maximum number of metadata replicas

 -R                 2                        Maximum number of data replicas


# mmlsdisk fs1

disk         driver   sector failure holds    holds                            storage

name         type       size   group metadata data  status        availability pool

------------ -------- ------ ------- -------- ----- ------------- ------------ ------------

nsd1         nsd         512      -1 yes      yes   ready         up           system

nsd2         nsd         512      -1 yes      yes   ready         up           system

nsd3         nsd         512      -1 no       yes   ready         up           pool1

nsd4         nsd         512      -1 no       yes   ready         up           pool1

# mmchdisk fs1 change -d "nsd1::::1::"

Verifying file system configuration information ...

mmchdisk: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.

# mmchdisk fs1 change -d "nsd2::::2::"

Verifying file system configuration information ...

mmchdisk: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.

# mmchdisk fs1 change -d "nsd3::::3::"

Verifying file system configuration information ...

mmchdisk: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.

# mmchdisk fs1 change -d "nsd4::::4::"

Verifying file system configuration information ...

mmchdisk: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.


# mmlsdisk fs1

disk         driver   sector failure holds    holds                            storage

name         type       size   group metadata data  status        availability pool

------------ -------- ------ ------- -------- ----- ------------- ------------ ------------

nsd1         nsd         512       1 yes      yes   ready         up           system

nsd2         nsd         512       2 yes      yes   ready         up           system

nsd3         nsd         512       3 no       yes   ready         up           pool1

nsd4         nsd         512       4 no       yes   ready         up           pool1

GPFS: 6027-740 Attention: Due to an earlier configuration change the file system

is no longer properly replicated.


# mmdf fs1

disk                disk size  failure holds    holds              free KB             free KB

name                    in KB    group metadata data        in full blocks        in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1                 10485760        1 yes      yes        10427392 ( 99%)          1440 ( 0%)

nsd2                 10485760        2 yes      yes        10427392 ( 99%)           976 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20854784 ( 99%)          2416 ( 0%)


Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3                 10485760        3 no       yes        10356224 ( 99%)           496 ( 0%)

nsd4                 10485760        4 no       yes        10354176 ( 99%)           496 ( 0%)

                -------------                         -------------------- -------------------

(pool total)         20971520                              20710400 ( 99%)           992 ( 0%)


                =============                         ==================== ===================

(data)               41943040                              41565184 ( 99%)          3408 ( 0%)

(metadata)           20971520                              20854784 ( 99%)          2416 ( 0%)

                =============                         ==================== ===================

(total)              41943040                              41565184 ( 99%)          3408 ( 0%)


Inode Information

-----------------

Number of used inodes:            4031

Number of free inodes:           63553

Number of allocated inodes:      67584

Maximum number of inodes:        67584


+. file 단위로 replication 하기

# dd if=/dev/zero of=/gpfs/fileset1/bigfile0 bs=64k count=1000

# mmlsattr -L /gpfs/fileset1/bigfile0

file name:            /gpfs/fileset1/bigfile0

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    system

fileset name:         fileset1

snapshot name:

creation Time:        Tue Nov 27 13:29:47 2012

Windows attributes:   ARCHIVE


# mmchattr -m 2 -r 2 /gpfs/fileset1/bigfile0


# mmlsattr -L /gpfs/fileset1/bigfile0

file name:            /gpfs/fileset1/bigfile0

metadata replication: 2 max 2

data replication:     2 max 2

immutable:            no

appendOnly:           no

flags:                unbalanced

storage pool name:    system

fileset name:         fileset1

snapshot name:

creation Time:        Tue Nov 27 13:29:47 2012

Windows attributes:   ARCHIVE


+. file system 단위로 replication 하기

# dd if=/dev/zero of=/gpfs/fileset1/bigfile1 bs=64k count=1000


# mmlsattr -L /gpfs/fileset1/bigfile1

file name:            /gpfs/fileset1/bigfile1

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    pool1

fileset name:         fileset1

snapshot name:

creation Time:        Tue Nov 27 11:28:29 2012

Windows attributes:   ARCHIVE


# mmchfs fs1 -m 2 -r 2

   >> filesystem에 대한 replication 속성을 2로 변경

   >> 변경이후로 생성되는 file은 replication이 2개로 바로 생성되나,

   >> filesystem 변경이전의 file들은 mmrestripefs 를 해주어야만 replication이 반영됨


# mmlsattr -L /gpfs/fileset1/bigfile1

file name:            /gpfs/fileset1/bigfile1

metadata replication: 1 max 2

data replication:     1 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    pool1

fileset name:         fileset1

snapshot name:

creation Time:        Tue Nov 27 11:28:29 2012

Windows attributes:   ARCHIVE


# dd if=/dev/zero of=/gpfs/fileset1/bigfile2 bs=64k count=1000

# mmlsattr -L /gpfs/fileset1/bigfile2

file name:            /gpfs/fileset1/bigfile2

metadata replication: 2 max 2

data replication:     2 max 2

immutable:            no

appendOnly:           no

flags:

storage pool name:    system

fileset name:         fileset1

snapshot name:

creation Time:        Tue Nov 27 13:38:29 2012

Windows attributes:   ARCHIVE


# mmrestripefs fs1 -R

   >> filesystem 변경이전의 file들은 mmrestripefs 를 해주어야만 replication이 반영됨

GPFS: 6027-589 Scanning file system metadata, phase 1 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 2 ...

Scanning file system metadata for pool1 storage pool

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 3 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 4 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-565 Scanning user file metadata ...

 100.00 % complete on Tue Nov 27 13:39:04 2012

GPFS: 6027-552 Scan completed successfully.



# mmlsattr -L /gpfs/fileset1/bigfile1

file name:            /gpfs/fileset1/bigfile1

metadata replication: 2 max 2

data replication:     2 max 2

immutable:            no

appendOnly:           no

flags:                unbalanced

storage pool name:    pool1

fileset name:         fileset1

snapshot name:

creation Time:        Tue Nov 27 11:28:29 2012

Windows attributes:   ARCHIVE



----------------------------------------------------

+. Snapshot

----------------------------------------------------

# echo "hello world:snap1" > /gpfs/fileset1/snapfile1

# mmcrsnapshot fs1 snap1

Writing dirty data to disk

Quiescing all file system operations

Writing dirty data to disk again

Resuming operations.

Checking fileset ...


# echo "hello world:snap2" >> /gpfs/fileset1/snapfile1

# mmcrsnapshot fs1 snap2

Writing dirty data to disk

Quiescing all file system operations

Writing dirty data to disk again

Resuming operations.

Checking fileset ...


# mmlssnapshot fs1

   >> fs1 파일시스템에 생성된 snapshot

Snapshots in file system fs1:

Directory                SnapId    Status  Created

snap1                    1         Valid   Tue Nov 27 13:43:56 2012

snap2                    2         Valid   Tue Nov 27 13:45:19 2012

# cat /gpfs/.snapshots/snap1/fileset1/snapfile1

# cat /gpfs/.snapshots/snap2/fileset1/snapfile1

   >> snapshot data는 해당 filesystem 의 .snapshot 아래에 저장됨


# rm /gpfs/fileset1/snapfile1

# cp /gpfs/.snapshots/snap2/fileset1/snapfile1 /gpfs/fileset1/snapfile1

   >> snapshot 복원


# mmdelsnapshot fs1 snap1

# mmdelsnapshot fs1 snap2

   >> 저장된 snapshot 제거

# mmlssnapshot fs1





----------------------------------------------------

+. GPFS Multi-Cluster 

  > http://www.ibm.com/developerworks/systems/library/es-multiclustergpfs/

  > 'All intercluster communication is handled by the GPFS daemon, which internally uses Secure Socket Layer (SSL).'

----------------------------------------------------

(cluster1-lpar11g) # mmauth genkey new

Generating RSA private key, 512 bit long modulus

.......++++++++++++

.......++++++++++++

e is 65537 (0x10001)

writing RSA key

mmauth: Command successfully completed


(cluster1-lpar11g) # mmshutdown -a

(cluster1-lpar11g) # mmauth update . -l AUTHONLY

Verifying GPFS is stopped on all nodes ...

mmauth: Command successfully completed


(cluster1-lpar11g) # mmstartup -a 

(cluster1-lpar11g) # rcp lpar11g:/var/mmfs/ssl/id_rsa.pub lpar12g:/tmp/lpar11g_id_rsa.pub


(cluster2-lpar12g) # mmauth genkey new

(cluster2-lpar12g) # mmshutdown -a

(cluster2-lpar12g) # mmauth update . -l AUTHONLY

(cluster2-lpar12g) # mmstartup -a 

(cluster2-lpar12g) # rcp lpar12g:/var/mmfs/ssl/id_rsa.pub lpar11g:/tmp/lpar12g_id_rsa.pub


(cluster1-lpar11g) # mmauth add gpfs_cluster2.lpar12g -k /tmp/lpar12g_id_rsa.pub

   >> gpfs_cluster2.lpar12g 와 같이 cluster의 node이름을 같이 지정해야 함

   >> mmauth 로 생성된 id_rsa.pub 파일을 확인

mmauth: Command successfully completed 


(cluster1-lpar11g) # mmauth grant gpfs_cluster2.lpar12g -f /dev/fs1

mmauth: Granting cluster gpfs_cluster2.lpar12g access to file system fs1:

        access type rw; root credentials will not be remapped.

mmauth: Command successfully completed


(cluster2-lpar12g) # mmremotecluster add gpfs_cluster.lpar11g -n lpar11g,lpar21g -k /tmp/lpar11g_id_rsa.pub

   >> "-n lpar11g,lpar21g" : gpfs_cluster.lpar11g 에 포함된 node list

mmremotecluster: Command successfully completed

mmremotecluster: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.


(cluster2-lpar12g) # mmremotefs add remotefs -f fs1 -C gpfs_cluster.lpar11g -T /remotefs

   >> cluster2 에서 gpfs_cluster.lpar11g 클러스터의 fs1을 file system에 추가

mmremotefs: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.


# mmchconfig opensslibname="/usr/lib/libssl.a(libssl64.so.0.9.8)" -N r07s6vlp1


(cluster2-lpar12g) # mmremotecluster show all

Cluster name:    gpfs_cluster.lpar11g

Contact nodes:   lpar11g,lpar21g

SHA digest:      7dcff72af5b5d2190ebe471e20bcfe8897d0e1cb

File systems:    remotefs (fs1)


(cluster2-lpar12g) # mmremotefs show all

Local Name  Remote Name  Cluster name       Mount Point        Mount Options    Automount  Drive  Priority

remotefs    fs1          gpfs_cluster.lpar11g /remotefs          rw               no           -        0


(cluster2-lpar12g) # mmmount remotefs

(cluster2-lpar12g) # mmdf remotefs


*. multicluster 구성시 꼬여서... gpfs cluster가 기동되지 않고, '6027-2114' 에러가 나는 경우...

     >>> cipherList 를 reset하면 됨

# mmchconfig cipherList=""

# mmauth show all

Cluster name:        gCluster5.lpar15 (this cluster)

Cipher list:         (none specified)

SHA digest:          (undefined)

File system access:  (all rw)



----------------------------------------------------

+. GPFS Call-back method

----------------------------------------------------

# cat /home/gpfs/nodedown.sh 

#!/bin/sh

echo "Logging a node leave event at: `date` " >> /home/gpfs/log/nodedown.log

echo "The event occurred on node:" $1  >> /home/gpfs/log/nodedown.log

echo "The quorum nodes are:" $2 >> /home/gpfs/log/nodedown.log


# rcp lpar11g:/home/gpfs/nodedown.sh lpar21g:/home/gpfs/

# rsh lpar21g chmod u+x /home/gpfs/nodedown.sh 


# mmaddcallback NodeDownCallback --command /home/gpfs/nodedown.sh --event nodeLeave --parms %eventNode --parms %quorumNodes

mmaddcallback: 6027-1371 Propagating the cluster configuration data to all

  affected nodes.  This is an asynchronous process.


# mmlscallback

NodeDownCallback

        command       = /home/gpfs/nodedown.sh

        event         = nodeLeave

        parms         = %eventNode %quorumNodes


# mmshutdown -N lpar21g ; cat /home/gpfs/log/nodedown.log





블로그 이미지

Melting

,