GPFS on developerworks #1 - Something isn't working, where do I start?

Cluster FileSystem 2012. 12. 4. 11:38

- http://www.ibm.com/developerworks/wikis/pages/viewpage.action?pageId=119080484

Something isn't working, where do I start?

Browse Space

Added by ScottGPFS, last edited by hraval on Nov 28, 2012 (view change)

This information has moved

This information has moved

Common
- A node failed and I had to rebuild it from scratch. How do I add it back into the cluster?
- I just created a new file system with pools and when I try to write a file I receive a no space error?
- I successfully created the NSD's but now GPFS does not see them
- Something seems slow or appears to hang

AIX

Linux
- GPFS fails to start and reports "no such file or directory" for libssl library in the mmfs.log file
- The Kernel module will not build.
Windows
- How can I verify the connection to the Active Directory server is working from a windows node?
- The GPFS installation fails (ERROR: encountered while installing driver package 'C:\Windows\SUA\usr\lpp\mmfs\driver\mmbus\mmbus.inf')
- When attempting to add a windows node mmaddnode fails (A remote host refused an attempted connect operation.)

Common

A server failed and I had to rebuild it from scratch. How do I add it back into the cluster?

How to recover a failed GPFS node.

It was not an NSD server:
If the node is not an NSD Server then the easiest way to recover is to remove the node from the cluster and add it back in.

Remove the node from the cluster (The node cannot be "ping"able for this to work)
```
mmdelnode -N failednode
```
And Add it back in using mmaddnode
```
mmaddnode -N failednode
```

It was an NSD Server:
If the node is an NSD server you cannot remove it from the cluster without reconfiguring the NSD server definitions for the disks. To recover the node without reconfiguring the NSD server definitions.

Reinstall the operating system and GPFS
Get a copy of the mmsdrfs file from the primary cluster configuration server. (You can get it from any node but this one is the most up to date).
```
scp PrimaryClusterConfigNode:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/mmsdrfs
```
Make sure the cluster configuration information is up to date
```
mmchcluster -p LATEST
```
At this point you should be able to start the node
```
mmstartup -N failednode
```

I just created a new file system with pools and when I try to write a file I receive a no space error?

If you just created a new file system and you cannot create a file it may be that you have storage pools and no policies. If your system pool is metadata only, which is fine, that means you have metadata space and no data space in that storage pool. The default rule places everything in the system storage pool. You can check the policy configuration by running mmlspolicy.

[root@perf7-c4-int64]#  mmlspolicy gpfs1
No policy file was installed  for file system 'gpfs1'.

If it says "No policy file was installed" you need a policy. To install a policy you can create a simple policy, something like this:

RULE 'default' set POOL 'satapool'

This policy will send all file data to the storage pool named satapool. Place that text in a file (policy.txt) then install the policy

mmchpolicy gpfs1  policy.txt

I successfully created the NSD's but now GPFS does not see them

Sometimes you can create an NSD using the mmcrnsd command and it completes successfully thenmmlsnsd -X, for example, says the devices are not found.

# mmlsnsd -X

Disk name NSD volume ID Device Devtype Node name Remarks
---------------------------------------------------------------------------------------------------
nsd1 1E05D0374B7053C4 - - node1 (not found) server node
nsd2 1E05D0384B7053C4 - - node2 (not found) server node

Cause: Unknown device name

This can be caused by GPFS not scanning the device name by default. For example, in this case the device name was /dev/fioa and GPFS does not look for devices that start with /dev/fio* by default. GPFS looks for /dev/sd*, for example. When you run the mmcrnsd command it reads the device name from the NSD descriptor you provided but when the GPFS daemon attempts to find that device it looks through the list of devices it discovered at startup or after you ran the mmnsddiscover command (If the devices were added since the GPFS daemon was started). In this case you need to tell GPFS about this new device name. You can do this using the nsddevices user exit. For information and an example on how to use the nsddevices user exit see Device Naming.

Something seems slow or appears to hang

If file system access seems slow or GPFS seems to be hanging. The place to start investigating this is to look at what GPFS calls "waiters." Waiters are operations that are talking longer than some threshold, the reporting threshold is different for each type of operation. Some waiters are normal and indicate a healthy system, some can provide you information on where a problem lies. To see the waiters:
When running GPFS 3.4 you can use the mmdiag command

mmdiag --waiters

When running GPFS 3,.3 or earlier

mmfsadm dump waiters

For more information see:

Linux

GPFS fails to start and reports "no such file or directory" for libssl library in the mmfs.log file

This message may occur if the right library is not specified by the opensslLibName config parameter which defaults to a list of common libssl library names: libssl.so:libssl.so.0:libssl.so.4. If the installed libssl library is not in the default list, you need to specify it through the opensslLibName configuration parameter.

mmchconfig opensslLibName="libssl.so.0.9.8e"

An alternative is to create a symbolic link that points a library name in the default list to the installed library.

ln -s  libssl.so.0.9.8e libssl.so

Another alternative is to install the openssl-dev rpm, which should create a symlink "libssl.so" as well.

On SLES11 or later:
zypper install libopenssl-devel
On RHEL5.4 or later:
yum install openssl-devel

The Kernel module will not build

If make Autoconfig or make World fails for some reason, and you are running a Linux distribution from Redhat on GPFS 3.4.0.4 and later you can try telling Autoconfig that the Linux version should be redhat using the LINUX_DISTRIBUTION flag. This will allow you to build the GPFS portability layer on CentOS, for example.

make LINUX_DISTRIBUTION=REDHAT_AS_LINUX Autoconfig

Windows

How can I verify the connection to the Active Directory server is working from a windows node?

To verify that the active directory connection is working from a Windows node, you can verify a user account using the mmfsadm command. For example, to verify that the root account is accessible:

mmfsadm test adlookup "cn=root"

The GPFS installation fails (ERROR: encountered while installing driver package 'C:\Windows\SUA\usr\lpp\mmfs\driver\mmbus\mmbus.inf')

Symptom:

The symptom is that the GPFS 3.3 installer on Windows 2008 server will fail and report that the install came to a premature end. When you look in the install logs in %SystemRoot%\SUA\var\adm\ras you see an error similar to this following:

DIFXAPP: ERROR: encountered while installing driver package 'C:\Windows\SUA\usr\lpp\mmfs\driver\mmbus\mmbus.inf'
DIFXAPP: ERROR: InstallDriverPackages failed with error 0x5

Resolution:

This problem can occur when the user's directory (e.g. C:\Users\root) does not permit the SYSTEM user to create temporary files during installation. If you are installing GPFS as root and root has been configured to support passwordless-ssh, then root's home directory will probably not allow SYSTEM write access.

Some known ways to fix this include:

Install GPFS as Administrator

Temporarily give SYSTEM write access to the user's home directory:

$ cd ~
$ ls -l -d .
drwxr-x---  1 root  +SYSTEM  8192 Jan 26 15:04 .
$ chmod g+w .
$ ls -l -d .
drwxrwx---  1 root  +SYSTEM  8192 Jan 26 15:04 .
$
$ #  INSTALL GPFS
$
$ chmod g-w .

Delete the profile of the user account attempting to do the install.

When attempting to add a windows node mmaddnode fails (A remote host refused an attempted connect operation.)

Symptom:

When you attempt to add a new windows node to an existing AIX or linux cluster you receive the following error:

nodea.ibm.com: A remote host refused an attempted connect  operation.
nodea.ibm.com: A remote host refused an attempted  connect operation.

Resolution:

There are a few things that can cause this:
1. SSH is not configured properly for passwordless access. To test this try doing ssh from every node to every node uisng

The short name (nodea)
The fully qualified name (nodea.ibm.com)
The IP address (10.1.1.0)

2. The existing cluster is using rsh instead of ssh. Windows does not support rsh, you mush use ssh. To fix this reconfigure the existing cluster for ssh using the mmchcluster command.

mmchcluster \-r /bin/ssh \-R /bin/ssh

저작자표시 비영리 변경금지 (새창열림)

'Cluster FileSystem' 카테고리의 다른 글

GPFS Tuning Parameters on developerworks #1 (0)	2012.12.04
GPFS FAQ on developerworks #1 (Mar 27, 2009) (0)	2012.12.04
GPFS Hands-on Guide (0)	2012.11.30

Melting

,

GPFS Tuning Parameters on developerworks #1

Cluster FileSystem 2012. 12. 4. 11:35

- http://www.ibm.com/developerworks/wikis/display/hpccentral/GPFS+Tuning+Parameters

GPFS Tuning Parameters

Browse Space

Added by ScottGPFS, last edited by hraval on Nov 28, 2012 (view change)

This information has moved

This information has moved here

This section describes some of the configuration parameters available in GPFS. Included are some notes on how they may affect performance.
These are GPFS configuration parameters that can be set cluster wide, on a specific node or sets of nodes.
To view the configuration parameters that has been changed from the default

mmlsconfig

To view the active value of any of these parameters you can run

mmfsadm dump config

To change any of these parameters use mmchconfig. For example to change the pagepool setting on all nodes.

mmchconfig pagepool=256M

Some options take effect immediately using the -i or -I flag to mmchconfig, some take effect after the node is restarted. Use -i to make the change permanent and affect the running GPFS daemon immediately. Use -I to affect the GPFS daemon only (reverts to saved settings on restart). Refer to the current GPFS Documentation for details.

In addition some parameters have a section called Tuning Guidelines. These are general guidelines that can be used to determine a starting point for tuning a parameter.

leaseRecoveryWait

The leaseRecoveryWait parameter defines how long the FS manager of a filesystem will wait after the last known lease expiration of any failed nodes before running recovery. A failed node cannot reconnect to the cluster before recovery is finished. The leaseRecoveryWait parameter value is in seconds and the default is 35.

Making this value smaller increases the risk that there may be IO in flight from the failing node to the disk/controller when recovery starts running. This may result in out of order IOs between the FS manager and the dying node.

In most cases where a node is expelled from the cluster there is a either a problem with the network or the node running out of resources like paging. For example, if there is an application running on a node paging the machine to death or overrunning network capacity, GPFS may not have a chance to contact the Cluster Manager node to renew its lease within the timeout period.

GPFSCmdPortRange

When GPFS administration commands are executed they may use one or more TCP/IP ports to complete the command. For example when using standard ssh an admin command opens a connection using port 22. In addition to the remote shell or file copy command ports there are additional ports that are opened to pass data to and from GPFS daemons. By default GPFS uses one of the ephemeral ports to complete these connections.

In some environments you may want to limit the range of ports used by GPFS administration commands. You can control the ports used by the remote shell and file copy commands by using different tools or configuring these tools to use different ports. The ports used by the GPFS daemon for administrative command execiution can be defined using the GPFS configuration parameter GPFSCmdPortRange.

mmchconfig GPFSCmdPortRange=lowport-highport

This allows you to limit the ports used for GPFS administration mm* command execution. You need enough ports to support all of the concurrent commands from a node so you should define 20 or more ports for this purpose. Example:

mmchconfig GPFSCmdPortRange=30000-30100

minMissedPingTimeout

The minMissedPingTimeout and maxMissedPingTimeout parameters set limits on the calculation of missedPingTimeout (MPT) which is the allowable time for pings to fail from the Cluster Manager (CM) to a node that has not renewed its lease. The default MPT is leaseRecoveryWait-5 seconds. The CM will wait MPT seconds after the lease has expired before declaring a node out of the cluster. The minMissedPingTimeout and maxMissedPingTimeout parameters value is in seconds and the defaults are 3 and 60 respectively. If these values are changed, only GPFS on the quorum nodes (from which the CM is elected) need to be recycled to take effect.

This can be used to cover over something like a central network switch failure timeout (or other network glitches) that may be longer than leaseRecoveryWait. It may prevent false node down conditions but will extend the time for node recovery to finish which may block other nodes making progress if the failing node held tokens for many shared files.

Just as in the case of leaseRecoveryWait, in most cases where a node is expelled from the cluster there is a either a problem with the network or the node running out of resources like paging. For example, if there is an application running on a node paging the machine to death or overrunning network capacity, GPFS may not have a chance to contact the Cluster Manager node to renew its lease within the timeout period.

maxMissedPingTimeout

See minMissedPingTimeout.

maxReceiverThreads

The maxReceiverThreads parameter is the number of threads used to handle incoming TCP packets. These threads gather the packets until there are enough bytes for the incoming RPC (or RPC reply) to be handled. For some simple RPCs, the receiver thread handles he message immediately, otherwise it hands it off some handler threads.

maxReceiverThreads defaults to the number of CPUs in the node up to 16. It can be configured higher if necessary up to 128 for very large clusters.

pagepool

The Pagepool parameter determines the size of the GPFS file data block cache. Unlike local file systems that use the operating system page cache to cache file data, GPFS allocates its own cache called the pagepool. The GPFS pagepool is used to cache user file data and file system metadata. The default pagepool size of 64MB is too small for many applications so this is a good place to start looking for performance improvement. In release 3.5, the default is 1GB for new installs. When upgrading it keeps the old setting.

Along with file data the pagepool supplies memory for various types of buffers like prefetch and write behind.

For Sequential IO

The default pagepool size may be sufficient for sequential IO workloads, however, a recommended value of 256MB is known to work well in many cases. To change the pagepool size, use the mmchconfig command. For example, to change the pagepool size to 256MB on all nodes in the cluster, execute the mmchconfig command:

mmchconfig pagepool=256M [-i]

If the file system blocksize is larger than the default (256K), the pagepool size should be scaled accordingly. For example, if 1M blocksize is used, the default 64M pagepool should be increased by 4 times to 256M. This allows the same number of buffers to be cached.

Random IO

The default pagepool size will likely not be sufficient for Random IO or workloads involving a large number of small files. In some cases allocating 4GB, 8GB or more memory can improve workload performance.

Random Direct IO

For database applications that use Direct IO, the pagepool is not used for any user data. It's main purpose in this case is for system metadata and caching the indirect blocks of the database files.

NSD servers

Assuming no applications or Filesystem Manager services are running on the NSD servers, the pagepool is only used transiently by the NSD worker threads to gather data from client nodes and write the data to disk. The NSD server does not cache any of the data. Each NSD worker just needs one pagepool buffer per operation, and the buffer can be potentially as large as the largest filesystem blocksize that the disks belong to. With the default NSD configuration, there will be 3 NSD worker threads per LUN (nsdThreadsPerDisk) that the node services. So the amount of memory needed in the pagepool will be 3*#LUNS*maxBlockSize. The target amount of space in the pagepool for NSD workers is controlled by nsdBufSpace which defaults to 30%. So the pagepool should be large enough so that 30% of it has enough buffers.

32 Bit operating systems

On 32-bit operating systems pagepool is limited by the GPFS daemons address space. This means that it cannot exceed 4GB in size and is often much smaller due to other limitations.

opensslLibName

To initialize multi-cluster communiations GPFS uses openssl. When initializng openssl GPFS looks for these ssl libraries: libssl.so:libssl.so.0:libssl.so.4 (as of GPFS 3.4.0.4). If you are using a newer version of openssl the filename may not match one in the list (exmaple libssl.so.6). You can use the opensslLibName parameter to tell GPFS to look for the newer version instead.

mmchconfig opensslLibName="libssl.so.6"

readReplicaPolicy

Options: default, local

Default
By default when data is replicated GPFS spreads the reads over all of the available failure groups. This configuration typically best when the nodes running GPFS have equal access to both copies of the data.

Local
A value of local has two effects on reading data in a replicated storage pool. Data is read from:

A local block device
A "local" NSD Server

The local block device means that the path to the disk is through a block special device on Linux, for example that would be a /dev/sd* or on AIX a /dev/hdisk device. GPFS does not do any further determination, so if disks at two sites are connected with a long distance fiber connection GPFS cannot distinguish what is local. So to use this option connect the sites using the NSD protocol over TCP/IP or InfiniBand Verbs (Linux Only).

Further GPFS uses the subnets configuration setting to determine what NSD servers are "local" to an NSD client. For NSD clients to benefit from "local" read access the NSD servers supporting the local disk need to be on the same subnet as the NSD clients accessing the data and that subnet needs to be defined using the "subnets" configuration parameter. This parameter is useful when GPFS replication is used to mirror data across sites and there are NSD clients in the cluster. This keeps read access requests from being sent over the WAN.

seqDiscardThreshold

The seqDiscardThreshold parameter affects what happens when GPFS detects a sequential read (or write) access pattern and has to decide what to do with the pagepool buffer after it is consumed (or flushed by writebehind threads). This is the highest performing option for the case where a very large file is read (or written) sequentially. The default for this value is 1MB which means that if you have a file that is sequentially read and is greater than 1MB GPFS does not keep the data in cache after consumption. There are some instances where large files are reread often by multiple processes; data analytics for example. In some cases you can improve the performance of these applications by increasing seqDiscardThreshold to be larger than the sets of files you would like to cache. Increasing seqDiscardthreshold tells GPFS to attempt to keep as much data in cache as possible for the files below that threshold. The value of seqDiscardThreshold is file size in bytes. The default is 1MB (1048576 bytes).

Tuning Guidelines

Increase this value if you want to cache files, that are sequentially read or written, that are larger than 1MB in size.
Make sure there are enough buffer descriptors to cache the file data. (See maxBufferDescs )

sharedMemLimit

The sharedMemLimit parameter allows you to increase the amount of memory available to store various GPFS structures including inode cache and tokens. When the value of sharedMemLimit is set to 0 GPFS automatically determines a value for sharedMemLimit. The default value varies on each platform. In GPFS 3.4 the default on Linux and Windows is 256MB. In GPFS 3.4 on Windows sharedMemLimit can only be used to decrease the size of the shared segment. To determine whether or not increasing sharedMemLimit may help you can use the mmfsadm dump fs command. For example, if you run mmfsadm dump fs and see that you are not getting the desired levels of maxFilesToCache (aka fileCacheLimit) or maxStatCache (aka statCacheLimit) you can try increasing sharedMemLimit.

# mmfsadm dump fs | head -8

Filesystem dump:
  UMALLOC limits:
    bufferDescLimit       4096 desired     4096
    fileCacheLimit        5000 desired    75000
    statCacheLimit       20000 desired    80000
    diskAddrBuffLimit     4000 desired     4000

The sharedMemLimit parameter is set in bytes.

As of release 3.4 the largest sharedMemLimit on Windows is 256M. On Linux and AIX the largest setting is 256G on 64 bit architectures and 2047M on 32 bit architectures. Using larger values may not work on some platforms/GPFS code versions. The actual sharedMemLimit on Linux may be reduced to a percentage of the kernel vmalloc space limit.

socketMaxListenConnections

The parameter socketMaxListenConnections sets the number of TCP/IP sockets that the daemon can listen on in parallel. This tunable was introduced in 3.4.0.7 specifically for large clusters, where an incast message to a manager node from a large number of client nodes may require multiple listen() calls and timeout. To be effective, the Linux tunable /proc/sys/net/core/somaxconn must also be modified from the default of 128. The effective value is the smaller of the GPFS tunable and the kernel tunable.

Default
Versions prior to 3.4.0.7 are fixed at 128. The default remains 128. The Linux kernel tunable also defaults to 128.

Tuning Guidelines
For clusters under 1000 nodes tuning this value should not be required. For larger clusters it should be set to approximately the number of nodes in the GPFS cluster.
Example
mmchconfig socketMaxListenConnections=1500
echo 1500 > /proc/sys/net/core/somaxconn
(or)
sysctl -w net.core.somaxconn=1500

socketRcvBufferSize

The parameter socketRcvBufferSize sets the size of the TCP/IP receive buffer used for NSD data communication. This parameter is in bytes.

socketSndBufferSize

The parameter socketSndBufferSize sets the size of the TCP/IP send buffer used for NSD data communication. This parameter is in bytes.

maxMBpS

The maxMBpS option is an indicator of the maximum throughput in megabytes that can be submitted by GPFS per second into or out of a single node. It is not a hard limit rather the maxMBpS value is a hint to GPFS used to calculate how much I/O can effectively be done for sequential prefetch and write-behind operations. In GPFS 3.3, the default maxMBpS value is 150, and in GPFS 3.5 it defaults to 2048. The maximum value is 100,000.

The maxMBpS value should be adjusted for the nodes to match the IO throughput the node is expected to support. For example, you should adjust maxMBpS for nodes that are directly attached to storage. A good rule of thumb is to set maxMBpS to twice the IO throughput required of a system. For example, if a system has two 4Gbit HBA's (400MB/sec per HBA) maxMBpS should be set to 1600. If the maxMBpS value is set too low sequential IO performance may be reduced.

This setting is not used by NSD servers. It is only used for application nodes doing sequential access to files.

maxFilesToCache

The maxFilesToCache parameter controls how many files each node can cache. Each file cached requires memory for the inode and a token(lock).

In addition to this parameter, maxStatCache config parameter controls how many files are partially cached; the default value of maxStatCache is 4 * maxFilesToCache, so maxFilesToCache controls five times the number of tokens, times the number of nodes in the cluster. The token managers for a given file system have to keep token state for all nodes in the cluster and from nodes in remote clusters that mount the filesystems. This should be considered when setting this value.

One thing to keep in mind is that on a large cluster, a change in the value of maxFilesToCache is greatly magnified. Increasing maxFilesToCache from the default of 1000 by a factor of 2 in a cluster with 200 nodes will increase the number of tokens a server needs to store by approximately 2,000,000. Therefore on large clusters it is recommended that if there is a subset of nodes with the need to have many open files only those nodes should increase the maxFilesToCache parameter. Nodes that may need an increased value for maxFilesToCache would include: login nodes, NFS/CIFS exporters, email servers or other file servers. For systems where applications use a large number of files, of any size, increasing the value for maxFilesToCache may prove beneficial. This is particularly true for systems where a large number of small files are accessed.

The increased value should be large enough to handle the number of concurrently open files plus allow caching of recently used files. You can use mmpmon (See monitoring ) to measure the number of files opened and closed on a GPFS file system. Changing the value of maxFilesToCache effects the amount of memory used on the node. The amount of memory required for inodes and control data structures can be calculated as: maxFilesToCache × 2.5 KB where 2.5 KB = 2 KB + 512 bytes for an inode Valid values of maxFilesToCache range from 1 to 100,000,000.

The size of the GPFS shared segment can limit the maximum setting of maxFilesToCache. See sharedMemLimit for details.

Note: prior to release 3.5 the default maxFilesToCache and maxStatCache were 1000 and 4000. As of release 3.5, the default values are 4000 and 1000. If you change the maxFilesToCache value but not the maxStatCache value, then maxStatCache will default to 4 * maxFilesToCache.

Tuning Guidelines:

The increased value should be large enough to handle the number of concurrently open files plus allow caching of recently used files.
Increasing maxFilesToCache can improve the performance of user interactive operations like running ls.
As a rule the total of ((maxFilesToCache + maxStatCache) * nodes) should not exceed (600,000 * (tokenMemLimit/256M) * (The number of manager nodes - 1)). This is assuming you account for the fact that different nodes may have different values of maxFilesToCache.

maxStatCache

The maxStatCache parameter sets aside additional pageable memory to cache attributes of files that are not currently in the regular file cache. This is useful to improve the performance of both the system and GPFS stat() calls for applications with a working set that does not fit in the regular file cache. The memory occupied by the stat cache can be calculated as: maxStatCache × 176 bytes
Valid values of maxStatCache range from 0 to 10,000,000.

For systems where applications test the existence of files, or the properties of files, without actually opening them (as backup applications do), increasing the value for maxStatCache may prove beneficial. The default value is: 4 × maxFilesToCache
On system where maxFilesToCache is greatly increased it is recommended that this value be manually set to something less than 4 * maxFilesToCache. For example if you set maxFilesToCache to 30,000 you may want to set maxStatCache to 30,000 as well. On compute nodes, this can usually be set much lower since they only have a few active files in use for any one job anyway.

Note: prior to release 3.5 the default maxFilesToCache and maxStatCache were 1000 and 4000. As of release 3.5, the default values are 4000 and 1000. If you change the maxFilesToCache value but not the maxStatCache value, then maxStatCache will default to 4 * maxFilesToCache.

The size of the GPFS shared segment can limit the maximum setting of maxStatCache. See sharedMemLimit for details.

maxBufferDescs

The value of maxBufferDescs defaults 10 * maxFilesToCache up to pagepool size/16K. When caching small files, it actually does not need to be more than a small multiple of maxFilesToCache since only OpenFile objects (not stat cache objects) can cache data blocks.

If an application needs to cache very large files you can tune maxBufferDescs to ensure there are enough to cache large files. To see the current value use the mmfsadm command:

#mmfsadm dump fs

Filesystem dump:
  UMALLOC limits:
    bufferDescLimit      10000 desired    10000
    fileCacheLimit        1000 desired     1000
    statCacheLimit        4000 desired     4000
    diskAddrBuffLimit      800 desired      800

In this case there are 10,000 buffer descriptors configured. If you have a 1MiB file system blocksize and want to cache a 20GiB file, you will not have enough buffer descriptors. In this case to cache a 20GiB file increase maxBufferDescs to at least 20,480 (20GiB/1MiB=20,480). It is not exactly a one to one mapping so a value of 32k may be appropriate.

mmchconfig maxBufferDescs=32k

nfsPrefetchStrategy

The parameter nfsPrefetchStrategy tells GPFS to optimize prefetching for NFS file style access patterns. It defines a window of the number of blocks around the current position that are treated as "fuzzy sequential" access. This can improve performance when reading big files sequentially, but because of kernel scheduling, some of the read requests come to GPFS out of order and therefore do not look "strictly sequential". If the filesystem blocksize is small relative to the read request sizes, making this bigger will provide a bigger window of blocks. The default is 0 .

Tuning Guidelines

Setting nfsPrefetchStrategy to 1 can improve sequential read performance when large files are accessed using NFS.

nsdMaxWorkerThreads

The parameter nsdMaxWorkerThreads sets the maximum number of NSD threads on an NSD server that will be concurrently transferring data with NSD clients. The default is 32 with a minimum of 8. The maximum value depends on the sum of worker1Threads + prefetchThreads + nsdMaxWorkerThreads < 1500 on 64bit architectures. This default works well in many clusters. In some cases it may help to increase nsdMaxWorkerThreads for large clusters, for example. Scale this with the number of LUNs, not the number of clients. You need this to manage flow control on the network between the clients and the servers.

numaMemoryInterleave

On Linux, setting numaMemoryInterleave to yes starts mmfsd with numactl --interleave=all. Enabling this parameter may improve the performance of GPFS running on NUMA based systems, for example if the system is based on a Intel Nehalem processor.

prefetchPct

"prefetchPct" defaults to 20% of pagepool. GPFS uses this as a guideline which limits how much pagepool space will be used for prefetch or writebehind buffers in the case of active sequential streams. The default works well for many applications. On the other hand, if the workload is mostly sequential (video serving/ingest) with very little caching of small files or random IO, then this number should be increased up to its 60% maximum, so that each stream can have more buffers available for prefetch and write behind operations.

prefetchThreads

Tuning Guidelines:

You usually don't need prefetchThreads to be more than twice the number of LUNs available to the node. Any more than that typically do nothing but wait in queues. The maximum value depends on the sum of worker1Threads + prefetchThreads + nsdMaxWorkerThreads < 1500 on 64bit architectures

Logfile

"Logfile" size should be larger for high metadata rate systems to prevent more glitches when the log has to wrap. Can be as large as 16MB on large blocksize file systems. To set this parameter use the --L flag on mmcrfs.

verbsLibName

To initialize IB RDMA GPFS looks for a file called libverbs.so. If that file name is different on your system libverbs.so.1.0 , for example, you can change this parameter to match.

Example:
  mmchconfig verbsLibName=libverbs.so.1.0

verbsrdmasperconnection

This is the maximum number of RDMAs that can be outstanding on any single RDMA connection. The default value is 8.

Tuning Guidelines

In testing the default was more than enough on SDR. All performance testing of the parameters was done on OFED 1.1 IB SDR.

verbsrdmaspernode

This is the maximum number of RDMAs that can be outstanding from the node. The default value is 0 (0 means default which is 32).

Tuning Guidelines

In testing the default was more than enough to keep adapters busy on SDR. All performance testing of the parameters was done on OFED 1.1 IB SDR.

worker1Threads

The worker1threads parameter represents the total number of concurrent application requests that can be processed at one time. This may include metadata operations like file stat() requests, open or close and for data operations. The work1threads parameter can be reduced without having to restart the GPFS daemon. Increasing the value of worker1threads requires a restart of the GPFS daemon.
To determine whether you have a sufficient number of worker1threads configured you can use the mmfsadm dump mbcommand.

# mmfsadm dump mb | grep Worker1
  Worker1Threads: max 48 current limit 48 in use 0 waiting 0   PageDecl: max 131072 in use 0

Using the mmfsadm command you can see how many threads are "in use" and how many application requests are "waiting" for a worker1thread.

Tuning Guidelines

The default is good for most workloads.
You may want to increase worker1threads if your application uses many threads and does Asynchronous IO (AIO) or Direct IO (DIO). In these cases the worker1threads are doing the IO operations. A good place to start is to have worker1theads set to approximately 2 times the number of LUNS in the file system so GPFS can keep the disks busy with parallel requests. The maximum value depends on the sum of worker1Threads + prefetchThreads + nsdMaxWorkerThreads < 1500 on 64bit architectures
Do not use excessive values of worker1threads.

worker3Threads

The worker3threads parameter specifies the number of threads to use for inode prefetch. A value of zero disables inode prefetch. The Default is 8.

Tuning Guidelines

The default is good for most workloads.

writebehindThreshold

The writebehindThreshold parameter determines at what point GPFS starts flushing newly written data out of the pagepool for a file. Increasing this value can increase how many newly created files are kept in cache. This can be useful, for example, if your workload contains temp files that are smaller than writebehindThreshold and are deleted before they are flushed from cache. As a default, GPFS uses pagepool for buffering IO for best performance but once the data is written the buffers are cleaned, increasing this value tells GPFS to try to keep the data in the pagepool as long as practical instead of immediately cleaning the buffers. This value is set for maximum file size to keep in cache and is specified in bytes. The default is 512k (524288 bytes). If the value is too large, there may be too many dirty buffers that the sync thread has to flush at the next sync interval causing a surge in disk IO. Keeping it small will ensure a smooth flow of dirty data to disk.

Tuning Guidelines

The default is good for most workloads.
Increase this value if you have a workload where not flushing newly written files larger than 512k would be beneficial.

저작자표시 비영리 변경금지 (새창열림)

'Cluster FileSystem' 카테고리의 다른 글

GPFS on developerworks #1 - Something isn't working, where do I start? (0)	2012.12.04
GPFS FAQ on developerworks #1 (Mar 27, 2009) (0)	2012.12.04
GPFS Hands-on Guide (0)	2012.11.30

Melting

,

GPFS FAQ on developerworks #1 (Mar 27, 2009)

Cluster FileSystem 2012. 12. 4. 11:22

- http://www.ibm.com/developerworks/wikis/pages/viewpage.action?pageId=104533251&navigatingVersions=true

GPFS FAQ

Browse Space

Added by puneetc, edited by puneetc on Mar 27, 2009

You are viewing an old version (v. 53) of this page.

The latest version is v. 60, last edited on Mar 27, 2009 (view differences | )
<< View previous version | view page history | view next version >>

GPFS Questions and Answers

Overview

General Parallel File System ^(TM) (GPFS ^(TM)) is a high performance shared-disk file management solution that provides fast, reliable access from nodes in a cluster environment. Parallel and serial applications can readily access shared files using standard UNIX(R) file system interfaces, and the same file can be accessed concurrently from multiple nodes. GPFS is designed to provide high availability through logging and replication, and can be configured for failover from both disk and server malfunctions. GPFS scalability and performance are designed to meet the needs of data intensive applications such as engineering design, digital media, data mining, relational databases, financial analytical, seismic data processing, scientific research and scalable file serving.

GPFS for POWER ^(TM) is supported on both AIX ^(R) and Linux ^(R). GPFS for AIX runs on the IBM ^(R) eServer ^(TM) Cluster 1600 as well as clusters of IBM Power, IBM System p ^(TM), IBM eServer p5, IBM BladeCenter ^(R) servers. GPFS for Linux runs on select IBM Power, System p, eServer p5,BladeCenter and IBM eServer OpenPower ^(R) servers. The GPFS Multiplatform product runs on the IBM System Cluster 1350 ^(TM) as well as Linux clusters based on selected IBM x86 System x ^(TM) rack-optimized servers, select IBM BladeCenter servers, or select IBM AMD processor-based servers.

Additionally, GPFS Multiplatform V3.2.1 is supported on nodes running Windows ^(R) Server 2003 R2 on 64-bit architectures (AMD x64 / EM64T) in an existing GPFS V3.2.1 cluster of AIX and/or Linux (32-bit or 64-bit) where all nodes are at service level 3.2.1-5 or later.

For further information regarding the use of GPFS in your clusters, see the GPFS: Concepts, Planning, and Installation Guide.

Table 1 - Updates to this FAQ for February 2009 include:

      1.4 What resources beyond the standard documentation can help me learn and use GPFS?
      2.3 What are the latest distributions and kernel levels that GPFS has been tested with?
      4.1 What disk hardware has GPFS been tested with?
      4.5 What devices have been tested with SCSI-3 Persistent Reservations?
      7.1 What support services are available for GPFS?

Questions & Answers

1. General questions:

2. Software questions:

3. Machine questions:

4. Disk questions:

5. Scaling questions:

6. Configuration and tuning questions:

7. Service questions:

1. General questions

Q1.1: How do I order GPFS?
A1.1:
To order GPFS:

To order GPFS on POWER for AIX or Linux, find contact information for your country at http://www.ibm.com/planetwide/
To order GPFS Multiplatform for Linux or Windows, go to the Passport Advantage ^(R) site at http://www.ibm.com/software/lotus/passportadvantage/

Q1.2: How is GPFS priced?
A1.2:
The price for GPFS for POWER is based on the number of processors active on the server where GPFS is installed.

The price for GPFS Multiplatform is based on a Processor Value Unit metric. A Value Unit is a pricing charge metric for program license entitlements which is based upon the quantity of a specifically designated measurement used for a given program, in this case processors or processor cores. Under the processor Value Unit licensing metric, each processor core is assigned a specific number of Value Units. You must acquire the total number of processor Value Units for each processor core on which the software program is deployed. IBM continues to define a processor to be each processor core on a chip. For example, a dual-core chip contains two processor cores.

A processor core is a functional unit within a computing device that interprets and executes instructions. A processor core consists of at least an instruction control unit and one or more arithmetic or logic unit. Not all processor cores require the same number of Value Unit entitlements. With multi-core technology, each core is considered a processor.

See http://www.ibm.com/software/lotus/passportadvantage/pvu_licensing_for_customers.html

Each software program has a unique price per Value Unit. The number of Value Unit entitlements required for a program depends on how the program is deployed in your environment and must be obtained from a Value Unit table. GPFS Multiplatform is grouped into packs of 10 processor Value Units as the minimum order quantity. For example, when you need 50 processor Value Units, you will order 5 of these 10 processor Value Unit part numbers to get the required 50 processor Value Units. To determine the total cost of deploying GPFS, multiply the program price per Value Unit by the total number of processor Value Units required. To calculate the number of Value Unit entitlements required, refer to the Value Unit Table at
http://www.ibm.com/software/lotus/passportadvantage/pvu_table_for_customers.html

and the Value Unit Calculator at
https://www-112.ibm.com/software/howtobuy/passportadvantage/valueunitcalculator/vucalc.wss

For further information:

In the United States, please call 1-888-SHOP-IBM
In all other locations, please contact your IBM Marketing Representative. For a directory of worldwide contact, seehttp://www.ibm.com/planetwide/index.html

Q1.3: Where can I find the documentation for GPFS?
A1.3:
The GPFS documentation is available in both PDF and HTML format on the Cluster Information Center athttp://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.gpfs.doc/gpfsbooks.html.

Q1.4: What resources beyond the standard documentation can help me learn about and use GPFS?
A1.4:
For additional information regarding GPFS see:

GPFS forums
- The GPFS technical discussion forum at http://www.ibm.com/developerworks/forums/dw_forum.jsp?forum=479 will help answer your questions on installing and running GPFS.
- For the latest announcements and news regarding GPFS please refer to the GPFS Announce Forum athttp://www.ibm.com/developerworks/forums/forum.jspa?forumID=1606.
The GPFS section of the HPC wiki at http://www.ibm.com/developerworks/wikis/display/hpccentral/General+Parallel+File+System+%28GPFS%29
IBM Training course GPFS 3.1 System Administration http://www-304.ibm.com/jct03001c/services/learning/ites.wss/us/en?pageType=course_description&courseCode=AU310
The Clusters Literature site for AIX(TM) at http://www.ibm.com/servers/eserver/clusters/library/wp_aix_lit.html
The Clusters Literature site for Linux at http://www.ibm.com/servers/eserver/clusters/library/wp_linux_lit.html
The IBM Redbooks(R) and Redpapers site at http://www.redbooks.ibm.com
- recently published A Guide to the IBM Clustered Network File System
The IBM Almaden Research GPFS web page at http://www.almaden.ibm.com/StorageSystems/file_systems/GPFS/index.shtml
Go to the IBM Systems Magazine at http://www.ibmsystemsmag.com/ and search on GPFS.

Q1.5: How can I ask a more specific question about GPFS?
A1.5:
Depending upon the nature of your question, you may ask it in one of several ways.

If you want to correspond with IBM regarding GPFS:
- If your question concerns a potential software error in GPFS and you have an IBM software maintenance contract, please contact 1-800-IBM-SERV in the United States or your local IBM Service Center in other countries. IBM Scholars Program users should notify the GPFS development team of potential software bugs through gpfs@us.ibm.com.
- If you have a question that can benefit other GPFS users, you may post it to the GPFS technical discussion forum athttp://www.ibm.com/developerworks/forums/dw_forum.jsp?forum=479
- This FAQ is continually being enhanced. To contribute possible questions or answers, please send them to gpfs@us.ibm.com
If you want to interact with other GPFS users, the San Diego Supercomputer Center maintains a GPFS user mailing list. The list is gpfs-general@sdsc.edu and those interested can subscribe to the list at http://lists.sdsc.edu/mailman/listinfo/gpfs-general

If your question does not fall into the above categories, you can send a note directly to the GPFS development team at gpfs@us.ibm.com. However, this mailing list is informally monitored as time permits and should not be used for priority messages to the GPFS team.

Q1.6: Does GPFS participate in the IBM Academic Initiative Program?
A1.6:

GPFS no longer participates in the IBM Academic Initiative Program.

If you are currently using GPFS with an education license from the Academic Initiative, we will continue to support GPFS 3.2 on a best-can-do basis via email for the licenses you have. However, no additional or new licenses of GPFS will be available from the IBM Academic Initiative program. You should work with your IBM client representative on what educational discount may be available for GPFS. See http://www.ibm.com/planetwide/index.html

Back to the top of the page

2. Software questions

Q2.1: What levels of the AIX O/S are supported by GPFS?
A2.1:
GPFS supports AIX V6.1, AIX V5.3 and V5.2 nodes in a homogenous or heterogeneous cluster running either the AIX or the Linux operating system.

Table 2. GPFS for AIX

	AIX V6.1	AIX V5.3	AIX V5.2
GPFS V3.2	X	X	X
GPFS V3.1		X	X

Notes:

1. The following additional filesets are required by GPFS V3.2:

xlC.aix50.rte (C Set ++(R) Runtime for AIX 5.0), version 8.0.0.0 or later
xlC.rte (C Set ++ Runtime), version 8.0.0.0 or later
These can be downloaded from Fix Central at http://www.ibm.com/eserver/support/fixes/fixcentral
2. Enhancements to the support of Network File System (NFS) V4 in GPFS V3 are only available on AIX V5.3 systems with the minimum technology level of 5300-04 applied or on AIX V6.1 with GPFS V3.2 .
3. The version of OpenSSL shipped with AIX V6.1 will not work with GPFS due to a change in how the library is built. To obtain the level of OpenSSL which will work with GPFS, see the question How do I get OpenSSL to work on AIX and SLES8/ppc64?
4. For additional support information, please also see the question, What is the current service information for GPFS?
5. Customers should consider the support plans for AIX V5.2 in their operating system decision.
6. For the latest GPFS fix level, go to https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html

Q2.2: What Linux distributions are supported by GPFS?
A2.2:
GPFS supports the following distributions:
Note: For kernel level support, please see question What are the latest kernel levels that GPFS has been tested with?

Table 3. Linux distributions supported by GPFS

	RHEL 5 ²	RHEL 4	RHEL 3	SLES 10 ^1,4	SLES 9	SLES 8
GPFS Multiplatform V3.2	X	X		X	X
GPFS for POWER V3.2	X ³	X		X	X
GPFS Multiplatform V3.1		X	X	X	X	X
GPFS for POWER V3.1		X		X	X

1. There is required service for GPFS V3.1 support of SLES 10.
Please see question What is the current service information for GPFS?
2. RHEL 5.0 and later on POWER requires GPFS V3.2.0.2 or later
3. GPFS V3.2 for Linux on POWER does not support mounting of a file system with a 16KB block size
when running on RHEL 5.
4. The GPFS GPL build requires imake. If imake was not installed on the SLES 10 system,
install xorg-x11-devel-*.rpm.

Q2.3: What are the latest kernel levels that GPFS has been tested with?
A2.3:
While GPFS runs with many different AIX fixes and Linux kernel levels, it is highly suggested that customers apply the latest fix levels and kernel service updates for their operating system. To download the latest GPFS service updates, go tohttps://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html

GPFS does not currently support the following kernels:

RHEL hugemem kernel
RHEL largesmp
RHEL uniprocessor (UP) kernel
SLES xen kernel

Table 4. GPFS for Linux V3.2

Linux Distribution	Kernel Level
POWER
Red Hat EL 5.3 ^1,2,3	2.6.18-128
Red Hat EL 4.7	2.6.9-78.0.13
SUSE Linux ES 10 SP2	2.6.16.60-0.27
SUSE Linux ES 9 SP4	2.6.5-7.312
x86_64
Red Hat EL 5.3 ^2,3	2.6.18-128
Red Hat EL 4.7	2.6.9-78.0.13
SUSE Linux ES 10 SP2	2.6.16.60-0.27
SUSE Linux ES 9 SP4	2.6.5-7.312
i386
Red Hat EL 5.3 ^2,3	2.6.18-128
Red Hat EL 4.7	2.6.9-78.0.13
SUSE Linux ES 10 SP2	2.6.16.60-0.27
SUSE Linux ES 9 SP4	2.6.5-7.312
Itanium ^(R) 2 ⁴
Red Hat EL 4.5	2.6.9-55.0.6
SUSE Linux ES 10 SP1	2.6.16.53-0.8
SUSE Linux ES 9 SP3	2.6.5-7.286

1. RHEL 5.0 and later on POWER requires GPFS V3.2.0.2 or later
2. With RHEL5.1, the automount option is slow. This issue should be addressed in the 2.6.18-53.1.4 kernel when it is available.
3. GPFS V3.2.1-3 or later supports the RHEL xen kernel.
4. GPFS for Linux on Itanium Servers is available only through a special Programming Request for Price Quotation (PRPQ). The install image is not generally available code. It must be requested by an IBM client representative through the RPQ system and approved before order fulfillment. If interested in obtaining this PRPQ, reference PRPQ # P91232 or Product ID 5799-GPS.

Table 5. GPFS for Linux V3.1

Linux Distribution	Kernel Level
POWER
Red Hat EL 4.7	2.6.9-78.0.13
SUSE Linux ES 10 SP2	2.6.16.60-0.27
SUSE Linux ES 9 SP4	2.6.5-7.312
x86_64
Red Hat EL 4.7	2.6.9-78.0.13
Red Hat EL 3.8	2.4.21-47.0.1
SUSE Linux ES 10 SP2	2.6.16.60-0.27
SUSE Linux ES 9 SP4	2.6.5-7.312
SUSE Linux ES 8 SP4	2.4.21-309
i386
Red Hat EL 4.7	2.6.9-78.0.13
Red Hat EL 3.8	2.4.21-47.0.1
SUSE Linux ES 10 SP2	2.6.16.60-0.27
SUSE Linux ES 9 SP4	2.6.5-7.312
SUSE Linux ES 8 SP4	2.4.21-309

Q2.4: What levels of the Windows O/S are supported by GPFS?
A2.4:
GPFS Multiplatform V3.2.1-5 and later, is supported on nodes running Windows Server 2003 R2 on 64-bit architectures (AMD x64 / EM64T) in an existing GPFS V3.2.1 cluster of AIX and/or Linux at V3.2.1-5 or later.

Q2.5: Can different GPFS maintenance levels coexist?
A2.5:
Certain levels of GPFS can coexist, that is, be active in the same cluster and simultaneously access the same file system. This allows upgrading GPFS within a cluster without shutting down GPFS on all nodes first, and also mounting GPFS file systems from other GPFS clusters that may be running a different maintenance level of GPFS. The current maintenance level coexistence rules are:

All GPFS V3.2 maintenance levels can coexist with each other and with GPFS V3.1 Maintenance Level 13 or later, unless otherwise stated in this FAQ.
See the Migration, coexistence and compatibility information in the GPFS V3.2 Concepts, Planning, and Installation Guide
- The default file system version was incremented in GPFS 3.2.1-5. File systems created using GPFS v3.2.1.5 code without using the --version option of the mmcrfs command will not be mountable by earlier code.
- GPFS V3.2 maintenance levels 3.2.1.2 and 3.2.1.3 have coexistence issues with other maintenance levels.
  Customers using a mixed maintenance level cluster that have some nodes running 3.2.1.2 or 3.2.1.3 and other nodes running other maintenance levels should uninstall the gpfs.msg.en_US rpm/fileset from the 3.2.1.2 and 3.2.1.3 nodes. This should prevent the wrong message format strings going across the mixed maintenance level nodes.
- Attention: Do not use the mmrepquota command if there are nodes in the cluster running a mixture of 3.2.0.3 and other maintenance levels. A fix will be provided in APAR #IZ16367. A fix can be provided for 3.2.0.3 upon request prior to APAR availability in the March service level available at https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html

All GPFS V3.1 maintenance levels can coexist with each other, unless otherwise stated in this FAQ.
Attention: GPFS V3.1 maintenance levels 10 (GPFS-3.1.0.10) thru 12 (GPFS-3.1.0.12) do not coexist with other maintenance levels
All nodes in the cluster must conform to one of these maintenance level compatibility restrictions:
- All nodes must be at maintenance levels 1-9 or 13 and later (GPFS-3.1.0.1 thru GPFS-3.1.0.9 or GPFS-3.1.0.13 and later)
- All nodes must be at maintenance levels 10-12 (GPFS-3.1.0.10 - GPFS-3.1.0.12)

Q2.6: Are there any requirements for Clustered NFS (CNFS) support in GPFS V3.2?
A2.6:
GPFS V3.2 Clustered NFS (CNFS) support requirements:

If NLM locking is required , until the code is included in the kernel, a kernel patch for lockd must be applied. This patch is currently available athttp://sourceforge.net/tracker/?atid=719124&group_id=130828&func=browse

The required lockd patch is not supported on RHEL 4 ppc64.

For SUSE distributions, use of the common NFS utilities (sm-notify in user space) is required. The specific patches required within util-linux are:
- support statd notification by name (patch-10113) http://support.novell.com/techcenter/psdb/2c7941abcdf7a155ecb86b309245e468.html
- specify a host name for the -v option (patch-10852)http://support.novell.com/techcenter/psdb/e6a5a6d9614d9475759cc0cd033571e8.html
- allow selection of IP source address on command line (patch-9617)http://support.novell.com/techcenter/psdb/c11e14914101b2debe30f242448e1f5d.html
For Red Hat distributions, use of nfs-utils 1.0.7 is required for rpc.statd fixes. Go to https://www.redhat.com/

Table 6. CNFS requirements

	lockd patch required	sm-notify required	rpc.statd required
SLES 10	X	X	not required
SLES 9	X	X	not required
RHEL 5	X (not available for ppc64)	included in base distribution	X
RHEL 4	X (not available for ppc64)	included in base distribution	X

Q2.7: Are there any requirements for the use of the Persistent Reserve support in GPFS V3.2?
A2.7:
GPFS V3.2 supports Persistent Reserve on AIX and requires:

For AIX 5L(TM) V5.2 APAR IZ00673
For AIX 5L V5.3 IZ01534, IZ04114

Q2.8: Are there any considerations when utilizing the Simple Network Management Protocol (SNMP)-based monitoring capability in GPFS V3.2?
A2.8:
Considerations for the use of the SNMP-based monitoring capability in GPFS include:

Currently, the SNMP collector node must be a Linux node in your GPFS cluster. GPFS utilizes Net-SNMP which is not supported by AIX.
Support for ppc64 requires the use of Net-SNMP 5.4.1. Binaries for Net-SNMP 5.4.1 on ppc64 are not available. You will need to download the source and build the binary. Go to http://net-snmp.sourceforge.net/download.html
If the monitored cluster is relatively large, you need to increase the communication time-out between the SNMP master agent and the GPFS SNMP subagent. In this context, a cluster is considered to be large if the number of nodes is greater than 25, or the number of file systems is greater than 15, or the total number of disks in all file systems is greater than 50. For more information see Configuring Net-SNMP in the GPFS: Advanced Administration Guide.

Back to the top of the page

3. Machine questions

Q3.1: What are the minimum hardware requirements for a GPFS cluster?
A3.1:
The minimum hardware requirements are:

GPFS for POWER: IBM POWER3(TM) or newer processor, 1 GB of memory
GPFS Multiplatform for Linux:
- Intel(R) Pentium(R) 3 or newer processor, with 512 MB of memory
- AMD Opteron(TM) processors, with 1 GB of memory
- Intel Itanium 2 processor with 1 GB of RAM¹
GPFS Multiplatform for Windows:
- Intel EM64T processors, with 1GB of memory
- AMD Opteron processors, with 1 GB of memory
  Note: Due to issues found during testing, GPFS for Windows is not supported on e325 servers

Additionally, it is highly suggested that a sufficiently large amount of swap space is configured. While the actual configuration decisions should be made taking into account the memory requirements of other applications, it is suggested to configure at least as much swap space as there is physical memory on a given node.

GPFS is supported on systems which are listed in, or compatible with, the IBM hardware specified in the Hardware requirements section of the Sales Manual for GPFS. If you are running GPFS on hardware that is not listed in the Hardware Requirements, should problems arise and after investigation it is found that the problem may be related to incompatibilities of the hardware, we may require reproduction of the problem on a configuration conforming to IBM hardware listed in the sales manual.

To access the Sales Manual for GPFS:

1. Go to http://www-306.ibm.com/common/ssi/OIX.wss
2. From A specific type menu, choose HW&SW Desc (Sales Manual,RPQ).
3. To view a GPFS sales manual, choose the corresponding product number to enter in the keyword field then click on Go

For General Parallel File System for POWER V3.2.1, enter 5765-G66
For General Parallel File System Multiplatform V3.2.1, enter 5724-N94
For General Parallel File System for AIX 5L V3.1, enter 5765-G66
For General Parallel File System for Linux on POWER V3.1, enter 5765-G67
For General Parallel File System Multiplatform V3.1 for Linux, enter 5724-N94

Q3.2: Is GPFS for POWER supported on IBM System i servers?
A3.2:
GPFS for POWER extends all features, function, and restrictions (such as operating system and scaling support) to IBM System i servers to match their IBM System p counterparts:

Table 7.

IBM System i	IBM System p
i-595	p5-595
i-570	p5-570, p6-570
i-550	p5-550
i-520	p5-520

No service updates are required for this additional support.

Q3.3: What machine models has GPFS for Linux been tested with?
A3.3:
GPFS has been tested with:

IBM x86 xSeries machine models:
- 330
- 335
- 336
- 340
- 342
- 345
- 346
- 360
- 365
- 440
- x3550
- x3650
- x3655
IBM BladeCenter x86 blade servers:
- HS20
- HS21
- HS40
- LS20
- LS21
IBM POWER processor-based blade servers:
- JS20
- JS21
- JS22
IBM BladeCenter Cell/B.E. ^(TM) blade servers
- QS21
IBM AMD processor-based servers:
- 325
- 326
IBM eServer p5:

For both the p5-590 and the p5-595: See the question What is the current service information for GPFS?

- 510
- 520
- 550
- 570
- 575
- 590
- 595
IBM Power POWER6 ^(TM)
- 570
- 575
- 595
IBM eServer OpenPower servers:
- 710
- 720
IBM eServer pSeries ^(R) machines models that support Linux
The IBM eServer Cluster 1300
The IBM System Cluster 1350

For hardware and software certification, please see the IBM ServerProven site at http://www.ibm.com/servers/eserver/serverproven/compat/us/

Q3.4: Is GPFS for Linux supported on all IBM ServerProven servers?
A3.4:
GPFS for Linux is supported on all IBM ServerProven servers:

With the distributions and kernel levels as listed in the question What are the latest distributions and kernel levels that GPFS has been tested with?
That meet the minimum hardware model requirements as listed in the question What are the minimum hardware requirements for a GPFS cluster?
Please see the IBM ServerProven site at http://www.ibm.com/servers/eserver/serverproven/compat/us/

Q3.5: What interconnects are supported for GPFS daemon-to-daemon communication in a GPFS cluster?
A3.5:
The interconnect for GPFS daemon-to-daemon communication depends upon the types of nodes in your cluster.

Table 8. GPFS daemon -to-daemon communication interconnects

Nodes in your cluster	Supported interconnect	Supported environments
Linux
	Ethernet	All supported GPFS environments
	10-Gigabit Ethernet	All supported GPFS environments
	Myrinet	IP only

Linux/AIX/Windows
	InfiniBand	GPFS Multiplatform V3.2 for Linux: IP and optionally VERBS RDMA Note: See the question Are there any considerations when utilizing the Remote Direct Memory Access (RDMA) offered by InfiniBand? GPFS for Linux on POWER V3.2: IP only GPFS V3.1: IP only SLES 9 or Red Hat EL 4.0, on System x servers SLES 9 SP 3 on System p5 servers with GPFS V3.1.0-4 or later
	Ethernet	All supported GPFS environments
	10-Gigabit Ethernet	All supported GPFS Linux environments AIX V5.3 AIX V6.1
AIX
	Ethernet	All supported GPFS environments
	10-Gigabit Ethernet	AIX V5.3 AIX V6.1
	Myrinet	AIX V5.2 and V5.3 64-bit kernel BladeCenter JS20 and p5 POWER5 servers IP only
	InfiniBand	AIX V5.3 GPFS V3.1 or V3.2 IP only
	eServer HPS	Homogenous clusters of either AIX V5.2 or V5.3

Q3.6: Does GPFS support exploitation of the Virtual I/O Server (VIOS) features of POWER5 processors?
A3.6:
Yes, GPFS allows exploitation of POWER5 VIOS configurations. Both the virtual SCSI (VSCSI) and the shared Ethernet adapter (SEA) are supported in single and multiple central electronics complex (CEC) configurations. This support is limited to GPFS nodes that are using either the AIX V6.1 or V5.3 operating system.

All LPARs in a CEC that are GPFS cluster members must have all the VIO disks mapped to each LPAR using virtual SCSI. This creates to GPFS a SAN environment where each node has access to disk on a local path without requiring network access. All of the NSD's in these configurations must not be coded with any NSD server associated with them.

Minimum required code levels:

VIOS Release 1.3.0.0 Fix Pack 8
AIX 5L V5.3 Service Pack 5300-05-01

There is no GPFS fix level requirement for this support, but it is recommended that you be at the latest GPFS level available. For information on the latest levels of GPFS go to https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html

For further information on POWER5 VIOS go to http://techsupport.services.ibm.com/server/vios/documentation/faq.html

For VIOS documentation, go to http://techsupport.services.ibm.com/server/vios/documentation/home.html

Back to the top of the page

4. Disk questions

Q4.1: What disk hardware has GPFS been tested with?
A4.1:
This set of tables displays the set of disk hardware which has been tested by IBM and known to work with GPFS. GPFS is not limited to only using this set of disk devices. Other disk devices may work with GPFS but they have not been tested by IBM. The GPFS support team will help customers who are using devices outside of this list of tested devices, to solve problems directly related to GPFS, but not problems deemed to be issues with the underlying device's behavior including any performance issues exhibited on untested hardware.

It is important to note that:

Each individual disk subsystem requires a specific set of device drivers for proper operation while attached to a host running GPFS or IBM Recoverable Virtual Shared Disk. The prerequisite levels of device drivers are not documented in this GPFS-specific FAQ. Refer to the disk subsystem's web page to determine the currency of the device driver stack for the host's operating system level and attachment configuration.

For information on IBM disk storage subsystems and their related device drivers levels and Operating System support guidelines, go tohttp://www.ibm.com/servers/storage/support/disk/index.html

Microcode levels should be at the latest levels available for your specific disk drive.

For the IBM System Storage ^(TM), go to http://www.ibm.com/servers/storage/support/allproducts/downloading.html

GPFS for Windows can only operate as an NSD client at this time, and as such does not support direct attached disks.

DS4000 customers: Please also see

The IBM TotalStorage DS4000 Best Practices and Performance Tuning Guide at http://publib-b.boulder.ibm.com/abstracts/sg246363.html?Open
For the latest firmware and device driver support for DS4100 and DS4100 Express Midrange Disk System, go tohttp://www.ibm.com/systems/support/supportsite.wss/selectproduct?brandind=5000028&familyind=5329597&osind=0&oldbrand=5000028&oldfamily=5345919&oldtype=0&taskind=2&matrix=Y&psid=dm
For the latest storage subsystem controller firmware support for DS4200, DS4700, DS4800, go to:
- https://www.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5075581&brandind=5000028
- https://www.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5073716&brandind=5000028

Table 9. Disk hardware tested with GPFS for AIX on POWER

GPFS for AIX on POWER:
	IBM System Storage DS6000 ^(TM) using either Subsystem Device Driver (SDD) or Subsystem Device Driver Path Control Module (SDDPCM) Configuration considerations: GPFS clusters up to 32 nodes are supported and require a firmware level of R9a.5b050318a or greater. See further requirements below.
	IBM System Storage DS8000 ^(TM) using either SDD or SDDPCM Configuration considerations: GPFS clusters up to 32 nodes are supported and require a firmware level of R10k.9b050406 or greater. See further requirements below.
	DS6000 and DS8000 service requirements: AIX 5L V5.2 maintenance level 05 (5200-05) - APAR # IY68906, APAR # IY70905 AIX 5L V5.3 maintenance level 02 (5300-02) - APAR # IY68966, APAR # IY71085 GPFS for AIX 5L V2.3 - APAR # IY66584, APAR # IY70396, APAR # IY71901 For the Disk Leasing model install the latest supported version of the SDD fileset supported on your operating system. For the Persistent Reserve model install the latest supported version of SDDPCM fileset supported for your operating system.
	IBM TotalStorage DS4100 (Formerly FAStT 100) with DS4000 EXP100 Storage Expansion Unit with Serial Advanced Technology Attachment (SATA) drives. IBM TotalStorage FAStT500 IBM System Storage DS4200 Express all supported expansion drawer and disk types IBM System Storage DS4300 (Formerly FAStT 600) with DS4000 EXP710 Fibre Channel (FC) Storage Expansion Unit, DS4000 EXP700 FC Storage Expansion Unit, or EXP100 IBM System Storage DS4300 Turbo with EXP710, EXP700, or EXP100 IBM System Storage DS4400 (Formerly FAStT 700) with EXP710 or EXP700 IBM System Storage DS4500 (Formerly FAStT 900) with EXP710, EXP700, or EXP100 IBM System Storage DS4700 Express all supported expansion drawer and disk types IBM System Storage DS4800 with EXP710, EXP100 or EXP810 IBM System Storage DS3400 (1726-HC4)
	IBM TotalStorage ESS (2105-F20 or 2105-800 with SDD) IBM TotalStorage ESS (2105-F20 or 2105-800 using AIX 5L Multi-Path I/O (MPIO) and SDDPCM))
	IBM System Storage Storage Area Network (SAN) Volume Controller (SVC) V2.1 and V3.1 The following APAR numbers are suggested: IY64709 - Applies to all GPFS clusters IY64259 - Applies only when running GPFS in an AIX V5.2 or V5.3 environment with RVSD 4.1 IY42355 - Applies only when running GPFS in a PSSP V3.5 environment SVC V2.1.0.1 is supported with AIX 5L V5.2 (Maintenance Level 05) and AIX 5L V5.3 (Maintenance Level 01). See http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002471 for specific advice on SAN Volume Controller recommended software levels.
	IBM 7133 Serial Disk System (all disk sizes)
	Hitachi Lightning 9900 ^(TM) (9910, 9960, 9970 and 9980 Hitachi Universal Storage Platform 100/600/1100 Notes: In all cases Hitachi Dynamic Link Manager(TM) (HDLM) (multipath software) or MPIO (default PCM - failover only) isrequired AIX ODM objects supplied by Hitachi Data Systems (HDS) are required for all above devices. Customers should consult with HDS to verify that their proposed combination of the above components is supported by HDS.
	EMC Symmetrix DMX Storage Subsystems (FC attach only) Selected models of CX/CX-3 family including CX300, CX400, CX500 CX600, CX700 and CX3-20, CX3-40 and CX3-80 Device driver support for Symmetrix includes both MPIO and PowerPath. Note: CX/CX-3 requires PowerPath. Customers should consult with EMC to verify that their proposed combination of the above components is supported by EMC.
	HP XP 128/1024 XP10000/12000 HP StorageWorks Enterprise Virtual Arrays (EVA) 4000/6000/8000 and 3000/5000 models that have bee upgraded to active-active configurations Note: HDLM multipath software is required
	IBM DCS9550 (either FC or SATA drives) FC attach only Minimum firmware 3.08b Must use IBM supplied ODM objects at level 1.7 or greater For more information on the DCS9550 go to http://www.datadirectnet.com/dcs9550/

Table 10. Disk hardware tested with GPFS for Linux on x86 xSeries servers

GPFS for Linux on xSeries servers:
	IBM XIV 2810 Minimum Firmware Level: 10.0.1 This storage subsystem has been tested on RHEL5.1 and greater SLES10.2 For more information, directions and recommended settings for attachment please refer to the latest Host Attach Guide for Linux located at the IBM XIV Storage System Information Center go tohttp://publib.boulder.ibm.com/infocenter/ibmxiv/r2/index.jsp
	IBM TotalStorage FAStT 200 Storage Server IBM TotalStorage FAStT 500 IBM TotalStorage DS4100 (Formerly FAStT 100) with EXP100 IBM System Storage DS4200 Express all supported expansion drawer and disk types IBM System Storage DS4300 (Formerly FAStT 600) with EXP710, EXP700, or EXP100 IBM System Storage DS4300 Turbo with EXP710, EXP700, or EXP100 IBM System Storage DS4400 (Formerly FAStT 700) with EXP710 or EXP700 IBM System Storage DS4500 (Formerly FAStT 900) with EXP710, EXP700, or EXP100 IBM System Storage DS4700 Express all supported expansion drawer and disk types IBM System Storage DS4800 with EXP710, EXP100 or EXP810 IBM System Storage DS3400 (1726-HC4)
	IBM TotalStorage Enterprise Storage Server ^(R) (ESS) models 2105-F20 and 2105-800, with Subsystem Device Driver (SDD)
	EMC Symmetrix Direct Matrix Architecture (DMX) Storage Subsystems 1000 with PowerPath v 3.06 and v 3.07
	IBM System Storage Storage Area Network (SAN) Volume Controller (SVC) V2.1 and V3.1 See http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002471 for specific advice on SAN Volume Controller recommended software levels.
	IBM DCS9550 (either FC or SATA drives) FC attach only minimum firmware 3.08b QLogic drivers at 8.01.07 or newer and IBM SAN Surfer V5.0.0 or newer http://support.qlogic.com/support/oem_detail_all.asp?oemid=376 For more information on the DCS9550 go to http://www.datadirectnet.com/dcs9550/
Restrictions:	IBM ServeRAID(TM) adapters are not supported.

Table 11. Disk hardware tested with GPFS for Linux on POWER

GPFS for Linux on POWER:

IBM System Storage DS4200 Express all supported expansion drawer and disk types

IBM System Storage DS4300 (Formerly FAStT 600) all supported drawer and disk types

IBM System Storage DS4500 (Formerly FAStT 900) all supported expansion drawer and disk types

IBM System Storage DS4700 Express all supported expansion drawer and disk types

IBM System Storage DS4800 all supported expansion drawer and disk types

IBM System Storage DS8000 using SDD

Table 12. Disk hardware tested with GPFS for Linux on AMD processor-based servers

GPFS for Linux on eServer AMD processor-based servers:	No devices tested specially in this environment.

Q4.2: What Fibre Channel Switches are qualified for GPFS usage and is there a FC Switch support chart available?
A4.2:
There are no special requirements for FC switches used by GPFS other than the switch must be supported by AIX or Linux. For further information seehttp://www.storage.ibm.com/ibmsan/index.html

Q4.3: Can I concurrently access SAN-attached disks from both AIX and Linux nodes in my GPFS cluster?
A4.3:
The architecture of GPFS allows both AIX and Linux hosts to concurrently access the same set of LUNs. However, before this is implemented in a GPFS cluster you must ensure that the disk subsystem being used supports both AIX and Linux concurrently accessing LUNs. While the GPFS architecture allows this, the underlying disk subsystem may not, and in that case, a configuration attempting it would not be supported.

Q4.4: What disk support failover models does GPFS support for the IBM System Storage DS4000 family of storage controllers with the Linux operating system?
A4.4:
GPFS has been tested with both the Host Bus Adapter Failover and Redundant Dual Active Controller (RDAC) device drivers.

To download the current device drivers for your disk subsystem, please go to http://www.ibm.com/servers/storage/support/

Q4.5: 4.5 What devices have been tested with SCSI-3 Persistent Reservations?
A4.5:
The following devices have been tested with SCSI-3 Persistent Reservations:

DS8000 (all 2105 and 2107 models) using SDDPCM or the default MPIO PCM on AIX.
DS4000 subsystems using the IBM RDAC driver on AIX. (devices.fcp.disk.array.rte)

The most recent versions of the device drivers are always recommended to avoid problems that have been addressed.

Note: For a device to properly offer SCSI-3 Persistent Reservation support for GPFS, it must support SCSI-3 PERSISTENT RESERVE IN with a service action ofREPORT CAPABILITIES. The REPORT CAPABILITIES must indicate support for a reservation type of Write Exclusive All Registrants. Contact the disk vendor to determine these capabilities.

Q4.6: Are there any special considerations when my cluster consists of two nodes?
A4.6:
Customers who previously used single-node quorum and are migrating to a supported level of GPFS, must be aware that the single-node quorum function has been replaced with node quorum with tiebreaker disks. The new node quorum with tiebreaker disks support does not depend upon the availability of SCSI-3 persistent reserve. All disks tested with GPFS can now utilize node quorum with tiebreaker disks as opposed to GPFS node quorum (one plus half of the explicitly defined quorum nodes in the GPFS cluster). For further information, see the GPFS: Concepts, Planning, and Installation Guide for your level of GPFS.

Back to the top of the page

5. Scaling questions

Q5.1: What are the GPFS cluster size limits?
A5.1:
The current maximum tested GPFS cluster size limits are:

Table 13. GPFS maximum tested cluster sizes

GPFS Multiplatform for Linux	2441 nodes
GPFS on POWER for AIX	1530 nodes
GPFS Multiplatform for Windows	64 nodes
Note:	Please contact gpfs@us.ibm.com if you intend to exceed: Configurations with Linux larger than 512 nodes Configurations with AIX larger than 128 nodes Configurations with Windows larger than 32 nodes

Although GPFS is typically targeted for a cluster with multiple nodes, it can also provide high performance benefit for a single node so there is no lower limit. However, there are two points to consider:

GPFS is a well-proven, scalable cluster file system. For a given I/O configuration, typically multiple nodes are required to saturate the aggregate file system performance capability. If the aggregate performance of the I/O subsystem is the bottleneck, then GPFS can help achieve the aggregate performance even on a single node.
GPFS is a highly available file system. Therefore, customers who are interested in single-node GPFS often end up deploying a multi-node GPFS cluster to ensure availability.²

Q5.2: What are the current file system size limits?
A5.2:
The current file system size limits are:

Table 14. Current file system size limits

GPFS 2.3 or later, file system architectural limit	2^99 bytes
GPFS 2.2 file system architectural limit	2^51 bytes (2 Petabytes)
Current tested limit	Approximately 2 PB
Note:	Contact gpfs@us.ibm.com if you intend to exceed 200 Terabytes

Q5.3: What is the current limit on the number of mounted file systems in a GPFS cluster?
A5.3:
The total number of mounted file systems within a GPFS cluster depends upon your service level of GPFS:

Table 15. Total number of mounted file systems

GPFS Service Level	Number of mounted file systems
GPFS V3.2.0.1 or later	256
GPFS V3.1.0.5 or later	64
GPFS V3.1.0.1 thru V3.1.0.4	32

Q5.4: What is the architectural limit of the number of files in a file system?
A5.4:
The architectural limit of the number of files in a file system is determined by the file system format. For file systems created prior to GPFS V2.3, the limit is 268,435,456. For file systems created with GPFS V2.3 or later, the limit is 2,147,483,648. Please note that the effective limit on the number of files in a file system is usually lower than the architectural limit, and could be adjusted using the -F option of the mmchfs command.

Q5.5: What is the current limit on the number of nodes that may concurrently join a cluster?
A5.5:
The total number of nodes that may concurrently join a cluster depends upon the level of GPFS which you are running:

GPFS V3.2 is limited to a maximum of 8192 nodes.
GPFS V3.1 is limited to a maximum of 4096 nodes.

A node joins a given cluster if it is:

A member of the local GPFS cluster (the mmlscluster command output displays the local cluster nodes).
A node in a different GPFS cluster that is mounting a file system from the local cluster.

For example:

GPFS clusterA has 2100 member nodes as listed in the mmlscluster command.
500 nodes from clusterB are mounting a file system owned by clusterA.

clusterA therefore has 2600 concurrent nodes.

Q5.6: What are the limitations on GPFS disk size?
A5.6:
The maximum disk size supported by GPFS depends on the file system format and the underlying device support. For file systems created prior to GPFS version 2.3, the maximum disk size is 1 TB due to internal GPFS file system format limitations. For file systems created with GPFS 2.3 or later, these limitations have been removed, and the maximum disk size is only limited by the OS kernel and device driver support:

Table 16. Disk size limitations

OS kernel	Maximum supported GPFS disk size
AIX, 64-bit kernel	>2TB, up to the device driver limit
AIX, 32-bit kernel	1TB
Linux 2.6 64-bit kernels	>2TB, up to the device driver limit
Linux 2.6 32-bit kernels, Linux 2.4	2TB

Notes:

The above limits are only applicable to nodes that access disk devices through a local block device interface, as opposed to NSD protocol. For NSD clients, the maximum disk size is only limited by the NSD server large disk support capability, irrespective of the kernel running on an NSD client node.
The basic reason for the significance of the 2TB disk size barrier is that this is the maximum disk size that can be addressed using 32-bit sector numbers and 512-byte sector size. A larger disk can be addressed either by using 64-bit sector numbers or by using larger sector size. GPFS uses 64-bit sector numbers to implement large disk support. Disk sector sizes other than 512 bytes are unsupported.
GPFS for Windows can only operate as an NSD client at this time, and as such does not support direct attached disks.

Back to the top of the page

6. Configuration and tuning questions

Please also see the question What is the current service information for GPFS?

Q6.1: What specific configuration and performance tuning suggestions are there?
A6.1:
In addition to the configuration and performance tuning suggestions in the GPFS: Concepts, Planning, and Installation Guide for your version of GPFS:

If your GPFS cluster is configured to use SSH/SCP, it is suggested that you increase the value of MaxStartups in sshd_config to at least 1024.
You must ensure that when you are designating nodes for use by GPFS you specify a non-aliased interface. Utilization of aliased interfaces may produce undesired results. When creating or adding nodes to your cluster, the specified hostname or IP address must refer to the communications adapter over which the GPFS daemons communicate. When specifying servers for your NSDs, the output of the mmlscluster command lists the hostname and IP address combinations recognized by GPFS. Utilizing an aliased hostname not listed in the mmlscluster command output may produce undesired results.
If your system consists of the eServer pSeries High Performance Switch, it is suggested that you configure GPFS over the ml0 IP network interface.
On systems running with the Linux 2.6 kernel, it is recommended you adjust the vm.min_free_kbytes kernel tunable. This tunable controls the amount of free memory that Linux kernel keeps available (i.e. not used in any kernel caches). When vm.min_free_kbytes is set to its default value, on some configurations it is possible to encounter memory exhaustion symptoms when free memory should in fact be available. Setting vm.min_free_kbytes to a higher value (Linux sysctl utility could be used for this purpose), on the order of magnitude of 5-6% of the total amount of physical memory, should help to avoid such a situation.

Also, please see the GPFS Redpapers:
- GPFS Sequential Input/Output Performance on IBM pSeries 690 at http://www.redbooks.ibm.com/redpapers/pdfs/redp3945.pdf
- Native GPFS Benchmarks in an Integrated p690/AIX and x335/Linux Environment athttp://www.redbooks.ibm.com/redpapers/pdfs/redp3962.pdf
- Sequential I/O performance of GPFS on HS20 blades and IBM System Storage(TM) DS4800 atftp://ftp.software.ibm.com/common/ssi/rep_wh/n/CLW03002USEN/CLW03002USEN.PDF

Q6.2: What configuration and performance tuning suggestions are there for GPFS when used primarily for Oracle databases?
A6.2:
Note: Only a subset of GPFS releases are certified for use in Oracle environments. For the latest status of GPFS certification:

For AIX go to, http://www.oracle.com/technology/products/database/clustering/certify/tech_generic_unix_new.html
For Linux go to, http://www.oracle.com/technology/products/database/clustering/certify/tech_generic_linux_new.html

In addition to the performance tuning suggestions in the GPFS: Concepts, Planning, and Installation Guide for your version of GPFS:
When running Oracle RAC 10g, it is suggested you increase the value for OPROCD_DEFAULT_MARGIN to at least 500 to avoid possible random reboots of nodes.

In the control script for the Oracle CSS daemon, located in /etc/init.cssd the value for OPROCD_DEFAULT_MARGIN is set to 500 (milliseconds) on all UNIX derivatives except for AIX. For AIX this value is set to 100. From a GPFS perspective, even 500 milliseconds maybe too low in situations where node failover may take up to a minute or two to resolve. However, if during node failure the surviving node is already doing direct IO to the oprocd control file, it should have the necessary tokens and indirect block cached and should therefore not have to wait during failover.
Using the IBM General Parallel File System is attractive for RAC environments because executables, trace files and archive log files are accessible on all nodes. However, care must be taken to properly configure the system in order to prevent false node evictions, and to maintain the ability to perform rolling upgrades of the Oracle software. Without proper configuration GPFS recovery from a node failure can interfere with cluster management operations resulting in additional node failures.

If you are running GPFS and Oracle RAC 10gR2 and encounter false node evictions:
- Upgrade the CRS to 10.2.0.3 or newer.
  
  The Oracle 10g Clusterware (CRS) executables or logs (the CRS_HOME) should be placed on a local JFS2 filesystem. Using GPFS for the CRS_HOME can inhibit CRS functionality on the surviving nodes while GPFS is recovering from a failed node for the following reasons:
  - In Oracle 10gR2, up to and including 10.2.0.3, critical CRS daemon executables are not pinned in memory. Oracle and IBM are working to improve this in future releases of 10gR2.
  - Delays in updating the CRS log and authorization files while GPFS is recovering can interfere with CRS operations.
  - Due to an Oracle 10g limitation rolling upgrades of the CRS are not possible when the CRS_HOME is on a shared filesystem.
- CSS voting disks and the Oracle Clusterware Registry (OCR) should not be placed on GPFS as the IO freeze during GPFS reconfiguration can lead to node eviction, and the inability of CRS to function. Place the OCR and Voting disk on shared raw devices (hdisks).
- Oracle Database 10g (RDBMS) executables are supported on GPFS for Oracle RAC 10g. However, the system should be configured to support multiple ORACLE_HOME's so as to maintain the ability to perform rolling patch application. Rolling patch application is supported for the ORACLE_HOME starting in Oracle RAC 10.2.0.3.
- Oracle Database 10g data files, trace files, and archive log files are supported on GPFS.

7. Service questions

Q7.1: What support services are available for GPFS?
A7.1:
Support services for GPFS include:

GPFS support page at http://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html

GPFS forums
- The GPFS technical discussion forum at http://www.ibm.com/developerworks/forums/dw_forum.jsp?forum=479 will |help answer your questions on installing and running GPFS.
- For the latest announcements and news regarding GPFS please refer to the GPFS Announce Forum athttp://www.ibm.com/developerworks/forums/forum.jspa?forumID=1606 .
IBM Global Services - Support Line for Linux

A 24x7 enterprise-level remote support for problem resolution and defect support for major distributions of the Linux operating system. Go tohttp://www.ibm.com/services/us/index.wss/so/its/a1000030.

IBM Systems and Technology Group Lab Services

IBM Systems and Technology Group (STG) Lab Services can help you optimize the utilization of your data center and system solutions.

STG Lab Services has the knowledge and deep skills to support you through the entire information technology race. Focused on the delivery of new technologies and niche offerings, STG Lab Services collaborates with IBM Global Services and IBM Business Partners to provide complementary services that will help lead through the turns and curves to keep your business running at top speed.

Go to http://www.ibm.com/systems/services/labservices/.

Subscription service for pSeries, p5, and OpenPower

This service provides technical information for IT professionals who maintain pSeries, p5 and OpenPower servers. Subscribe athttp://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd

GPFS software maintenance

GPFS defect resolution for current holders of IBM software maintenance contracts:

- In the United States contact us toll free at 1-800-IBM-SERV (1-800-426-7378)
- In other countries, contact your local IBM Service Center

Contact gpfs@us.ibm.com for all other services or consultation on what service is best for your situation.

Q7.2: What is the current service information for GPFS?
A7.2:
The current GPFS service information includes:

For GPFS v3.1, if there are foreign characters in file or directory names, the mmapplypolicy command may fail
GPFS: 6027-902 Error parsing work file /tmp/tsmigrate.
inodeslist.<pid>
The workaround for this problem is to:
- Upgrade to GPFS v3.2 where this problem no longer exists.
- If you need to stay on GPFS v3.1:

Install GNU sort contained in the GNU coreutils from the AIX Toolbox for Linux Applications at http://www-03.ibm.com/systems/p/os/aix/linux/toolbox/download.html
Set the environment variables

{{MM_SORT_CMD = "LC_ALL=C }}
/local-or-opts-wherever-gnu-binaries-happen-to-be/sort -z"
MM_SORT_EOR = "" #empty string

For GPFS V3.2 use with AIX V6.1:
- GPFS is supported in a Ethernet/10-Gigabit Ethernet environment, see the question What interconnects are supported for GPFS daemon-to-daemon communication in my GPFS cluster?
- The versions of OpenSSL shipped as part of the AIX Expansion Pack, 0.9.8.4 and 0.9.8.41, ARE NOT compatible with GPFS due to the way the OpenSSL libraries are built. To obtain the level of OpenSSL which will work with GPFS, see the question How do I get OpenSSL to work on AIX and SLES8/ppc64?
- Role Based Access Control (RBAC) is not supported by GPFS and is disabled by default.
- Workload Partitions (WPARs) or storage protection keys are not exploited by GPFS.
If you get errors on RHEL5 when trying to run GPFS self-extractor archive from the installation media, please run export _POSIX2_VERSION=199209 first.
When installing or migrating GPFS, the minimum levels of service you must have applied are:
- GPFS V3.2 you must apply APAR IY99639 (GPFS V3.2.0-1)
- GPFS V3.1 you must apply APAR IY82778
- GPFS V2.3 you must apply APAR IY63969
  
           If you do not apply these levels of service and you attempt to start GPFS, you will receive an error message similar to:
  
  mmstartup: Required service not applied. Install GPFS 3.2.0.1 or later
  mmstartup: Command failed Examine previous error messages to determine cause
  
           Upgrading GPFS to a new major release on Linux:
  
           When migrating to a new major release of GPFS (for example, GPFS 3.1 to GPFS 3.2), the supported migration path is to install the GPFS base images for the new release, then apply any required service updates. GPFS will not work correctly if you use rpm -U command to upgrade directly to a service level of a new major release without installing the base images first. If this should happen you must uninstall and then reinstall the gpfs.base package.
  
           Note: Upgrading to the GPFS 3.2.1.0 level from a pre-3.2 level of GPFS does not work correctly, and the same workaround is required.
GPFS V3.1 maintenance levels 10 (GPFS-3.1.0.10) thru 12 (GPFS-3.1.0.12) do not coexist with other maintenance levels

All nodes in the cluster must conform to one of these maintenance level compatibility restrictions:
- All nodes must be at maintenance levels 1-9 or 13 and later (GPFS-3.1.0.1 thru GPFS-3.1.0.9 or GPFS-3.1.0.13 and later)
- All nodes must be at maintenance levels 10-12 (GPFS-3.1.0.10 - GPFS-3.1.0.12)
Required service for support of SLES 10 includes:

If running GPFS V3.1, service update 3.1.0-8 available at
https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/download/home.html
The GPFS required level of Korn shell for SLES 10 support is version ksh-93r-12.16 and can be obtained using one of these architecture-specific links:

x86 at
https://you.novell.com/update/i386/update/SUSE-SLES/10/PTF/43ed798d45b1ce66790327fe89fb3ca6/20061201

POWER at
https://you.novell.com/update/ppc/update/SUSE-SLES/10/PTF/43ed798d45b1ce66790327fe89fb3ca6/20061201

x86_64 at
https://you.novell.com/update/x86_64/update/SUSE-SLES/10/PTF/43ed798d45b1ce66790327fe89fb3ca6/20061201
For SLES 10 on POWER:
The gpfs.base 3.1.0-0 rpm must be installed using the rpm --nopre flag BEFORE any updates can be applied.
/etc/init.d/running-kernel shipped prior to the availability of the SLES 10 SP1 kernel source rpm contains a bug that results in the wrong set of files being copied to the kernel source tree. Until SP1 is generally available, the following change should also address the problem:

--- running-kernel.orig 2006-10-06 14:54:36.000000000 -0500
+++ /etc/init.d/running-kernel 2006-10-06 14:59:58.000000000 -0500
@@ -53,6 +53,7 @@
arm*|sa110) arch=arm ;;
s390x) arch=s390 ;;

parisc64) arch=parisc ;;
+ ppc64) arch=powerpc ;;
esac
# FIXME: How to handle uml?

When running GPFS on either a p5-590 or a p5-595:
- The minimum GFW (system firmware) level required is SF222_081 (GA3 SP2), or later.
  
  For the latest firmware versions, see the IBM Technical Support at http://www14.software.ibm.com/webapp/set2/firmware/gjsn
- The supported Linux distribution is SUSE Linux ES 9.
- Scaling is limited to 16 total processors.
IBM testing has revealed that some customers using the Gigabit Ethernet PCI-X adapters with the jumbo frames option enabled may be exposed to a potential data error. While receiving packet data, the Gigabit Ethernet PCI-X adapter may generate an erroneous DMA address when crossing a 64 KB boundary, causing a portion of the current packet and the previously received packet to be corrupted.

These Gigabit Ethernet PCI-X adapters and integrated Gigabit Ethernet PCI-X controllers could potentially experience this issue:
- Type 5700, Gigabit Ethernet-SX PCI-X adapter (Feature Code 5700)
- Type 5701, 10/100/1000 Base-TX Ethernet PCI-X Adapter (Feature code 5701)
- Type 5706, Dual Port 10/100/1000 Base-TX Ethernet PCI-X Adapter (Feature code 5706)
- Type 5707, Dual Port Gigabit Ethernet-SX PCI-X Adapter (Feature code 5707)
- Integrated 10/100/1000 Base-TX Ethernet PCI-X controller on machine type 7029-6C3 and 6E3 (p615)
- Integrated Dual Port 10/100/1000 Base-TX Ethernet PCI-X controller on machine type 9111-520 (p520)
- Integrated Dual Port 10/100/1000 Base-TX Ethernet PCI-X controller on machine type 9113-550 (p550)
- Integrated Dual Port 10/100/1000 Base-TX Ethernet PCI-X controller on machine type 9117-570 (p570)
  
  This problem is fixed with:
- For AIX 5L 5.2, APAR IY64531
- For AIX 5L 5.3, APAR IY64393
IBM testing has revealed that some customers with the General Parallel File System who install AIX 5L Version 5.2 with the 5200-04 Recommended Maintenance package (bos.mp64 at the 5.2.0.40 or 5.2.0.41 levels) and execute programs which reside in GPFS storage may experience a system wide hang due to a change in the AIX 5L loader. This hang is characterized by an inability to login to the system and an inability to complete some GPFS operations on other nodes. This problem is fixed with the AIX 5L APAR IY60609. It is suggested that all customers installing the bos.mp64 fileset at the 5.2.0.40 or 5.2.0.41 level, who run GPFS, immediately install this APAR.
Service bulletins for pSeries, p5, and OpenPower servers at http://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd

Sign in with your IBM ID.
Under the Bulletins tab:
- For the Select a heading option, choose Cluster on POWER.
- For the Select a topic option, choose General Parallel File System.
- For the Select a month option, select a particular month or choose All months.

Q7.3: How do I download fixes for GPFS?
A7.3:
To download fixes for GPFS, go to
https://www14.software.ibm.com/webapp/set2/sas/f/gpfs/home.html

Q7.4: What are the current GPFS advisories?
A7.4:
The current GPFS advisories are:

Currently with GPFS Multiplatform for Linux V3.2.1-4 and lower, with Infiniband RDMA enabled, an issue exists which under certain conditions may cause data corruption. This is fixed in GPFS 3.2.1-6. Please apply 3.2.1-6 or turn RDMA off.
GPFS 2.3.0.x not compatible with AIX 5.3 TL6

         Currently GPFS 2.3.0.x on AIX TL6 has a known private heap memory leak.

         USER'S AFFECTED: All customers using GPFS 2.3 and AIX 5.3

         DESCRIPTION: GPFS 2.3.0.0 through 2.3.0.23 do not work with AIX 5.3 TL6 due to the changes that AIX made in the threading library. GPFS 2.3 PTF 24 and up do have the necessary code changes to work with TL6 but they produce a private heap memory leak due to AIX APAR IZ04791. The AIX fix for this problem is scheduled for AIX TL6 SP4. A workaround that can be used until obtaining AIX TL6 SP4 is to change the GPFS configuration to not use the sigwait library call (mmchconfig asyncSocketNotify=no). Therefore, until the issue is resolved please be advised not to use GPFS 2.3.0.0 through 2.3.0.23 and AIX 5.3 TL6 in a production environment. AIX 5.3 TL1 through 5 are known to work with all GPFS 2.3 PTFs.

         EFIX AVAILABLE: There are no fixes at this time. Once one is available notice will be given. Please seehttps://www14.software.ibm.com/webapp/set2/sas/f/gpfs/download/aix.html
In certain GPFS 2.3 and 3.1 PTF levels there is a subtle GPFS issue in truncate, where if multiple nodes are accessing the same file against which a truncate is issued on one of the nodes, a time window existed during which incorrect size information could be communicated to some nodes, which may cause GPFS to mishandle the last fragment of the file. This could lead to various failed internal consistency checks, manifested by the GPFS daemon shutting down abnormally.

The affected GPFS PTF levels are:
- GPFS 3.1.0-6
- GPFS 3.1.0-5
- GPFS 2.3.0-17
- GPFS 2.3.0-16
- GPFS 2.3.0-15
  
  Recommended action:
- For customers running GPFS 3.1.0.x PTF 7 contains a fix and is available at www14.software.ibm.com/webapp/set2/sas/f/gpfs/download/home.html
- For all other versions, please contact support.
Customers running IBM Virtual Shared Disk V4.1 using a communications adapter other than the IBM eServer pSeries High Performance Switch, who have configured IBM Virtual Shared Disk with an IP packet size greater then the Max Transfer Unit (MTU) of the network, may experience packet corruption.

         IP must fragment packets that are greater than the MTU size of the network. On faster interconnects such as Gigabit Ethernet, the IP fragmentation buffer can be overrun and end up incorrectly assembling the fragments. This is an inherent limitation of the IP protocol, which can occur when the number of packets transferred exceeds the counter size, which then rolls over, potentially resulting in a duplicate packet number.

         If a duplicate packet number occurs, and the checksum matches that of the expected packet, corruption of the IBM Virtual Shared Disk packets can result in GPFS file system corruption. IBM Virtual Shared Disk will attempt to validate the incoming packets and discard misformed packets, but it can not identify them every time (since checksums for different data patterns may be the same).

         The level of IBM Virtual Shared Disk affected (shipped in AIX 5.2.x and later releases) has been available since October 2003, and the problem has only been confirmed as having occurred in an internal IBM test environment.

         IP fragmentation can be prevented by configuring the IBM Virtual Shared Disk IP packet size less than or equal to the MTU size of the network. This will move the fragmentation into the IBM Virtual Shared Disk layer, which can correctly process the fragmentation.

         The current IBM Virtual Shared Disk infrastructure allows for 160 packets per request which will limit the maximum buddy buffer size that can be used. For example:
o for an MTU of 1500, you need to set the IBM Virtual Shared Disk IP packet size to 1024 effectively limiting the maximum buddy buffer size to 128 KB.
o for an MTU of 9000, you need to set the IBM Virtual Shared Disk IP packet size to 8192 effectively limiting the maximum buddy buffer size to 1 MB.

         You can check the IBM Virtual Shared Disk IP packet size with these two commands:

         vsdatalst -n
             Shows you the value that will take affect at the next reboot.
         statvsd
             Show you the current value that the IBM Virtual Shared Disk device driver is using.

         Here is an example of how to set the IP packet size when using jumbo Ethernet frames (MTU = 9000):

         updatevsdnode -n ALL -M 8192
         dsh -a ctlvsd -M 8192

         For more information see the RSCT for AIX 5L Managing Shared Disks manual at http://publib.boulder.ibm.com/ infocenter/clresctr/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsctbooks.html and search on the commands vsdnode, updatevsdnode, and ctlvsd.

         APAR IY66940 will completely prevent IP fragmentation and will enforce the IBM Virtual Shared Disk IP packet size being less than the MTU size. This will also remove the restrictions relating to the maximum IBM Virtual Shared Disk buddy buffer size.

         Anyone who cannot take the preventive action, for whatever reason, or is unsure whether their environment may be affected, should contact IBM service to discuss their situation:
- In the United States contact us toll free at 1-800-IBM-SERV (1-800-426-7378)
- In other countries, contact your local IBM Service Center

Q7.5: What Linux kernel patches are provided for clustered file systems such as GPFS?
A7.5:
The Linux kernel patches provided for clustered file systems are expected to correct problems that may be encountered when using GPFS with the Linux operating system. The supplied patches are currently being submitted to the Linux development community but may not be available in particular kernels. It is therefore suggested that they be appropriately applied based on your kernel version and distribution.

A listing of the latest patches, along with a more complete description of these patches, can be found at the General Parallel File System project on SourceForge ^(R) .net at http://sourceforge.net/tracker/?atid=719124&group_id=130828&func=browse:

Click on the Summary description for the desired patch.
Scroll down to the Summary section on the patch page for a description of and the status of the patch.
To download a patch:
1. Scroll down to the Attached Files section.
2. Click on the Download link for your distribution and kernel level.

site.mcr consideration:
Patches listing a site.mcr define have additional steps to perform:

Apply the patch to the Linux kernel, recompile, and install this kernel.
In site.mcr either #define the option or uncomment the option if already present. Consult /usr/lpp/mmfs/src/README for more information.
Recompile and reinstall the GPFS portability layer.

Q7.6: What Windows hotfix updates are required for GPFS?
A7.6:
The current Windows hotfix updates required for GPFS consist of :

KB article 956548 at http://support.microsoft.com/kb/956548/en-us ;only the hotfix for Windows Server 2003 (Fix243497) is required.
KB article 950098 at http://support.microsoft.com/kb/950098/en-us

Q7.7: Where can I find licensing and ordering information for GPFS?
A7.7:
The Cluster Software Ordering Guide provides the following information:

Licensing information

Licenses can also be viewed at http://www.ibm.com/software/sla/sladb.nsf

Ordering information
Software Maintenance Agreement information
Product End of Market/Service dates

Software support lifecycle information can also be viewed at http://www-306.ibm.com/software/support/lifecycle/index_a_z.html

Hardware and Software requirements

To view the Guide please go to http://www.ibm.com/systems/clusters/software/reports/order_guide.html

Back to the top of the page

Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any of IBM's intellectual property rights may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10594-1785
USA

For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to:

IBM World Trade Asia Corporation
Licensing
2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law:

INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Licensees of this program who wish to have information about it for the purpose of enabling: i) the exchange of information between independently created programs and other programs (including this one) and ii) the mutual use of the information which has been exchanged, should contact:

IBM Corporation
Intellectual Property Law
2455 South Road,P386
Poughkeepsie, NY 12601-5400
USA

Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee.

The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us.

Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

If you are viewing this information softcopy, the photographs and color illustrations may not appear.

Trademarks

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( (R) or (TM)), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information athttp://www.ibm.com/legal/copytrade.shtml

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Red Hat, the Red Hat "Shadow Man" logo, and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc., in the United States and other countries.

UNIX is a registered trademark of the Open Group in the United States and other countries.

Microsoft, Windows, Windows NT, and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries, or both.

Other company, product, and service names may be the trademarks or service marks of others.

February 2009
Copyright International Business Machines Corporation 2004,2009. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
1.
GPFS for Linux on Itanium Servers is available only through a special Programming Request for Price Quotation (PRPQ). The install image is not generally available code. It must be requested by an IBM client representative through the RPQ system and approved before order fulfillment. If interested in obtaining this PRPQ, reference PRPQ # P91232 or Product ID 5799-GPS.
2.
GPFS Sequential Input/Output Performance on IBM pSeries 690, Gautam Shah, James Wang available athttp://www.redbooks.ibm.com/redpapers/pdfs/redp3945.pdf

Back to the top of the page

저작자표시 비영리 변경금지 (새창열림)

'Cluster FileSystem' 카테고리의 다른 글

GPFS on developerworks #1 - Something isn't working, where do I start? (0)	2012.12.04
GPFS Tuning Parameters on developerworks #1 (0)	2012.12.04
GPFS Hands-on Guide (0)	2012.11.30

Melting

,

GPFS Hands-on Guide

Cluster FileSystem 2012. 11. 30. 19:46

+. ssh configuration

>> cluster생성시 ssh와 scp type을 사용할 경우, 아래와 같이 ssh key를 먼저 등록해야 함

(lpar11g) # ssh lpar21g >> exit (key만 생성 -> /.ssh 디렉토리가 생성됨)

(lpar11g) # ssh-keygen -t rsa >> 엔터만 치고 실행완료

(lpar11g) # cat /.ssh/id_rsa.pub >> /.ssh/authorized_keys (자신의 public key도 넣어주어야함)

(lpar11g) # ' cat /.ssh/id_rsa.pub ' 명령으로 생성된 내용을 (lpar12g) 서버의 /.ssh/authorized_keys 파일에 추가

>>> lpar21g에서도 동일한 작업 반복

+. rsh configuration

+. basic configuration scripts

-------------------------------------------------------------

### fs re-size

chfs -a size=+512M /

chfs -a size=+1G /usr

chfs -a size=+1G /var

chfs -a size=+512M /tmp

chfs -a size=+1G /home

chfs -a size=+512M /opt

### IO state

chdev -l sys0 -a iostat=true

### Disk Attr

ins=1

while [ ${ins} -le 42 ]

do

chdev -l hdisk${ins} -a pv=yes

chdev -l hdisk${ins} -a reserve_policy=no_reserve

((ins=ins+1))

done

### Time sync

setclock lpar11 ; date ; rsh lpar11 date

### hushlogin (turn off the login msg)

touch /.hushlogin

-------------------------------------------------------------

+. .profile

-------------------------------------------------------------

lpar11:/# cat .profile

export GPFS_HOME=/usr/lpp/mmfs

export PS1=`hostname -s`':$PWD# '

export PATH=/usr/local/bin:${GPFS_HOME}/bin:${PATH}

set -o vi

banner `hostname`

-------------------------------------------------------------

+. /etc/hosts

-------------------------------------------------------------

#-- team 1 (multi-cluster #1)

10.10.10.151 lpar11

10.10.10.161 lpar21

10.10.11.151 lpar11g

10.10.11.161 lpar21g

#-- team 2 (multi-cluster #2)

10.10.10.152 lpar12

10.10.10.162 lpar22

10.10.11.152 lpar12g

10.10.11.162 lpar22g

-------------------------------------------------------------

+. lslpp -l gpfs*

>> 기본 fileset외에 patch를 같이 깔아줘야 기동이 됨 (fixcentral)

ex. GPFS 3.5.0.0 > 기동않됨 -> GPFS 3.5.0.6 로 update 후 기동

+. cluster configuration file

# cat /home/gpfs/gpfs.allnodes

lpar11g:quorum-manager

lpar21g:quorum-manager

# >> 구문 >> NodeName:NodeDesignations:AdminNodeName (NodeDesignations와 AdminNodeName은 optional)

>> NodeDesignations 항목은 'manager|client'-'quorum|nonquorum' 으로 지정

>> manager|client - 'file system manager'의 pool에 넣을 건지의 여부 (default는 client)

>> quorum|nonquorum - default는 noquorum

# >> quorum의 max는 8개이며, 모든 quorum 은 tiebreak disk에 대해서 access가 가능해야 함

# >> lpar11_gpfs, lpar21_gpfs > should be register to /etc/hosts

>> private/public network 모두 가능하나, 당연히 private network 권장(subnet을 나누기를 권장)

# >> 일반적인 RAC구성에서는 모든 노드를 guorum-manager로 구성

+. gpfs cluster creation - 사전 정의된 nodelist 파일을 이용 #1

# mmcrcluster -n /home/gpfs/gpfs.allnodes -p lpar11g -s lpar21g -C gpfs_cluster -r /usr/bin/ssh -R /usr/bin/scp

# >> ssh 와 scp type으로 설치시 private network의 hostname으로 지정되어야 password관련 문제가 없음

# >> -n : node description file

>> -p : primary node

>> -s : secondary node

>> -C : cluster name

>> -r : remote shell (default 는 rsh)

>> -R : remote copy (default 는 rcp)

+. mmlscluster

+. license agreement (gpfs3.3+)

# mmchlicense server --accept -N lpar11g,lpar21g

+. gpfs 기동 및 status check

# mmstartup -a

# mmgetstate -a

Node number Node name GPFS state

------------------------------------------

1 lpar11g active

2 lpar21g active

# mmlscluster

GPFS cluster information

========================

GPFS cluster name: gpfs_cluster.lpar11g

GPFS cluster id: 1399984813853589142

GPFS UID domain: gpfs_cluster.lpar11g

Remote shell command: /usr/bin/rsh

Remote file copy command: /usr/bin/rcp

GPFS cluster configuration servers:

-----------------------------------

Primary server: lpar11g

Secondary server: lpar21g

Node Daemon node name IP address Admin node name Designation

------------------------------------------------------------------------------------

1 lpar11g 170.24.46.151 lpar11g quorum-manager

2 lpar21g 170.24.46.161 lpar21g quorum-manager

+. gpfs cluster creation - 사전 정의된 nodelist 파일을 이용 (man mmaddnode) #2

# mmcrcluster -N lpar11g:manager-quorum -p lpar11g -r /usr/bin/ssh -r /usr/bin/scp

# mmaddnode -N lpar21g

# mmchcluster -s lpar21g

# mmchnode -N lpar21g --client --nonquorum

# mmchnode -N lpar21g --manager --quorum

# mmlscluster

>> 삭제는 mmdelnode -N lpar21g

>> Primary Node와 Secondary Node는 삭제가 불가능

+. cluster 내 node의 기동 및 종료

# mmstartup -a / mmshutdown -a

# mmstartup -N lpar21g / mmshutdown -N lpar21g

# mmgetstate -a / mmgetstate -N lpar21g

# mmgetstate -a

Node number Node name GPFS state

------------------------------------------

1 lpar11g active

2 lpar21g active

+. gpfs cluster관련 log

# tail -f /var/adm/ras/mmfs.log.latest

+. NSD Configuration

# cat /home/gpfs/gpfs.clusterDisk

hdisk1:::dataAndMetadata::nsd1:

hdisk2:::dataAndMetadata::nsd2:

hdisk3:::dataAndMetadata::nsd3:

hdisk4:::dataAndMetadata::nsd4:

hdisk5:::dataAndMetadata::nsd5:

hdisk6:::dataAndMetadata::nsd6:

hdisk7:::dataAndMetadata::nsd7:

>> [NSD로 사용할 Disk]:[Primary Server]:[Backup Server]:[Disk Usage]:[Failure Group]:[Desired NSD Name]:[Storage Pool]

>> [NSD로 사용할 Disk] - '/dev/hdisk3' 형태로도 지정 가능

[Primary Server] && [Backup Server]

- cluster내에서 I/O를 수행하는 Primary && Backup Server

- cluster의 node들이 SAN으로 연결되어 있고, 모두 같은 disk를 공유할 경우 >> 이 두항목을 Blank로 비워둬야 함

- case 1) (lpar11g 와 lpar21g 가 san으로 연결되고 GPFS server로 작동) && (lpar12g 는 san 연결없이 client 로 작동)

-> lpar12g에서 nsd의 위치를 알 수 없기 때문에....

hdisk1:lpar11g:lpar21g:dataAndMetadata::nsd1:

와 같이 정의하고, lpar12g는 node추가시 client로 등록

case 2) lpar11g 와 lpar21g 가 san으로 연결되고 GPFS server && client 로 작동

-> 모든 server와 client에서 nsd에 직접 접근이 가능하므로...

hdisk1:::dataAndMetadata::nsd1:

와 같이 정의해도 됨.

[Disk Usage]

- 'dataOnly|metadataOnly|dataAndMetadata|descOnly'

- system pool의 경우 dataAndMetadata 가 default && storage pool은 dataOnly 가 default

[Desired NSD Name] - cluster내에서 unique 해야하며, 미지정시 'gpfs1nsd' 와 같은 형식으로 생성됨

# mmcrnsd -F /home/gpfs/gpfs.clusterDisk

# mmlsnsd

File system Disk name NSD servers

---------------------------------------------------------------------------

(free disk) nsd1 (directly attached)

(free disk) nsd2 (directly attached)

(free disk) nsd3 (directly attached)

(free disk) nsd4 (directly attached)

(free disk) nsd5 (directly attached)

(free disk) nsd6 (directly attached)

(free disk) nsd7 (directly attached)

# mmdelnsd nsd7

>> 삭제 후 개별 NSD 추가는 gpfs.clusterDisk2 파일을 추가로 생성 후, mmcrnsd -F gpfs.clusterDisk2 로 수행

*. 기존에 gpfs로 한번 잡힌 disk는 lspv에서 'gpfs'표시되며, 이 경우 다시 gpfs용으로 잡을 수 없음 > 정보를 깨고 다시 구성해야함

lpar11 && lpar12 >>

dd if=/dev/zero of=/dev/rhdiskXX bs=1024 count=100

rmdev -dl hdiskXX

cfgmgr -v

chdev -l hdiskXX -a reserve_policy=no_reserve

chdev -l hdiskXX -a pv=yes

+. tiebreak disk 설정

# mmshutdown -a

# mmchconfig tiebreakerDisks=nsd7

>> Tiebreaker disk를 1개만 설정할 경우...

# mmchconfig tiebreakerDisks=no

# mmchconfig tiebreakerDisks='nsd5;nsd6;nsd7'

>> Tiebreaker disk는 3개이상을 권장하며, 3-node이상에서는 tiebreaker disk가 불필요

# mmlsconfig | grep tiebreakerDisks

tiebreakerDisks nsd5;nsd6;nsd7

+. gpfs file system 생성

# cp /home/gpfs/gpfs.clusterDisk /home/gpfs/gpfs.clusterDisk.fs

>> /home/gpfs/gpfs.clusterDisk.fs 에서 nsd1, nsd2, nsd3, nsd4 외의 항목삭제

# mmcrfs /gpfs fs1 -F /home/gpfs/gpfs.clusterDisk.fs -A yes -B 512k -n 16

>> '/gpfs' : mount point

>> 'fs1' : device name (filesystem name) > '/dev/fs1' 처럼 주기도 함

>> '-F' /home/gpfs/gpfs.clusterDisk.fs : filesystem으로 등록할 NSD 정의 (mmcrnsd하면 자동으로 생성됨)

>> '-A yes' : mmstartup시 automount 여부

>> '-B 512k' : block size로 16k~1MB까지 설정가능. Oracle은 일반적으로 256k(512k)를 권장하나,

file size가 작은 그룹웨어나 이메일 시스템은 block size를 작게 설정해야함

>> '-n 16' : 파일시스템을 사용할 노드의 개수, 한번 설정하면 수정이 불가능하므로 여유를 둬서 크게 설정할 것

# mmmount all -a

# mmlsfs fs1

flag value description

------------------- ------------------------ -----------------------------------

-f 8192 Minimum fragment size in bytes

-i 512 Inode size in bytes

-I 16384 Indirect block size in bytes

-m 1 Default number of metadata replicas

-M 2 Maximum number of metadata replicas

-r 1 Default number of data replicas

-R 2 Maximum number of data replicas

-j cluster Block allocation type

-D nfs4 File locking semantics in effect

-k all ACL semantics in effect

-n 32 Estimated number of nodes that will mount file system

-B 262144 Block size

-Q none Quotas enforced

none Default quotas enabled

--filesetdf no Fileset df enabled?

-V 13.01 (3.5.0.0) File system version

--create-time Mon Nov 26 14:08:51 2012 File system creation time

-u yes Support for large LUNs?

-z no Is DMAPI enabled?

-L 4194304 Logfile size

-E yes Exact mtime mount option

-S no Suppress atime mount option

-K whenpossible Strict replica allocation option

--fastea yes Fast external attributes enabled?

--inode-limit 67584 Maximum number of inodes

-P system Disk storage pools in file system

-d nsd1;nsd2;nsd3;nsd4 Disks in file system

--perfileset-quota no Per-fileset quota enforcement

-A yes Automatic mount option

-o none Additional mount options

-T /gpfs Default mount point

--mount-priority 0 Mount priority

# mmdf fs1

disk disk size failure holds holds free KB free KB

name in KB group metadata data in full blocks in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 98 GB)

nsd1 10485760 -1 yes yes 10440704 (100%) 488 ( 0%)

nsd2 10485760 -1 yes yes 10440448 (100%) 248 ( 00%)

nsd3 10485760 -1 yes yes 10440960 (100%) 248 ( 00%)

nsd4 10485760 -1 yes yes 10440192 (100%) 472 ( 0%)

------------- -------------------- -------------------

(pool total) 41943040 41762304 (100%) 1456 ( 00%)

============= ==================== ===================

(total) 41943040 41762304 (100%) 1456 ( 00%)

Inode Information

-----------------

Number of used inodes: 4038

Number of free inodes: 63546

Number of allocated inodes: 67584

Maximum number of inodes: 67584

+. gpfs filesystem nsd disk 관리

# mmlsfs all

File system attributes for /dev/fs1:

====================================

flag value description

------------------- ------------------------ -----------------------------------

-f 8192 Minimum fragment size in bytes

-i 512 Inode size in bytes

-I 16384 Indirect block size in bytes

-m 1 Default number of metadata replicas

-M 2 Maximum number of metadata replicas

-r 1 Default number of data replicas

-R 2 Maximum number of data replicas

-j cluster Block allocation type

-D nfs4 File locking semantics in effect

-k all ACL semantics in effect

-n 16 Estimated number of nodes that will mount file system

-B 262144 Block size

-Q none Quotas enforced

none Default quotas enabled

--filesetdf no Fileset df enabled?

-V 13.01 (3.5.0.0) File system version

--create-time Mon Nov 26 14:08:51 2012 File system creation time

-u yes Support for large LUNs?

-z no Is DMAPI enabled?

-L 4194304 Logfile size

-E yes Exact mtime mount option

-S no Suppress atime mount option

-K whenpossible Strict replica allocation option

--fastea yes Fast external attributes enabled?

--inode-limit 67584 Maximum number of inodes

-P system Disk storage pools in file system

-d nsd1;nsd2;nsd3;nsd4 Disks in file system

--perfileset-quota no Per-fileset quota enforcement

-A yes Automatic mount option

-o none Additional mount options

-T /gpfs Default mount point

--mount-priority 0 Mount priority

# mmlsdisk fs1

disk driver sector failure holds holds storage

name type size group metadata data status availability pool

------------ -------- ------ ------- -------- ----- ------------- ------------ ------------

nsd1 nsd 512 -1 yes yes ready up system

nsd2 nsd 512 -1 yes yes ready up system

nsd3 nsd 512 -1 yes yes ready up system

nsd4 nsd 512 -1 yes yes ready up system

# mmdeldisk fs1 nsd4

>> 'fs1' filesystem에서 'nsd4' disk 제거

Deleting disks ...

GPFS: 6027-589 Scanning file system metadata, phase 1 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 2 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 3 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 4 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-565 Scanning user file metadata ...

100.00 % complete on Mon Nov 26 17:05:54 2012

GPFS: 6027-552 Scan completed successfully.

Checking Allocation Map for storage pool 'system'

GPFS: 6027-370 tsdeldisk64 completed.

mmdeldisk: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmunmount /gpfs -N lpar21g

# mmunmount /gpfs -a

# mmdelfs fs1

>> 'fs1' filesystem 자체를 삭제

GPFS: 6027-573 All data on following disks of fs1 will be destroyed:

nsd1

nsd2

nsd3

GPFS: 6027-574 Completed deletion of file system /dev/fs1.

mmdelfs: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

+. GPFS 운영 및 관리 (Administration)

# mmlsmgr fs1

# mmchmgr fs1 lpar21g

GPFS: 6027-628 Sending migrate request to current manager node 170.24.46.151 (lpar11g).

GPFS: 6027-629 Node 170.24.46.151 (lpar11g) resigned as manager for fs1.

GPFS: 6027-630 Node 170.24.46.161 (lpar21g) appointed as manager for fs1.

# mmlsmgr fs1

file system manager node [from 170.24.46.161 (lpar21g)]

---------------- ------------------

fs1 170.24.46.161 (lpar21g)

# mmchconfig autoload=yes

>> system 기동시 자동으로 gpfs daemon 기동

mmchconfig: Command successfully completed

mmchconfig: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmlsconfig | grep autoload

autoload yes

# mmfsadm dump config

>> GPFS 파라미터 모두 조회

# mmfsadm dump config | grep pagepool

pagepool 536870912

pagepoolMaxPhysMemPct 75

pagepoolPageSize 65536

pagepoolPretranslate 0

----------------------------------------------------

+. Storage Pools, Filesets and Policies

----------------------------------------------------

+. clean-up test env

# mmumount all -a ; mmdelfs fs1 ; mmdelnsd "nsd1;nsd2;nsd3;nsd4"

>> 'nsd5;nsd6;nsd7' 의 tiebreaker disk는 그대로 유지

+. create nsd

# cat /home/gpfs/gpfs.clusterDisk.storagePool

hdisk1:::dataAndMetadata::nsd1:system

hdisk2:::dataAndMetadata::nsd2:system

hdisk3:::dataOnly::nsd3:pool1

hdisk4:::dataOnly::nsd4:pool1

# mmcrnsd -F /home/gpfs/gpfs.clusterDisk.storagePool

mmcrnsd: Processing disk hdisk1

mmcrnsd: Processing disk hdisk2

mmcrnsd: Processing disk hdisk3

mmcrnsd: Processing disk hdisk4

mmcrnsd: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmcrfs /gpfs fs1 -F /home/gpfs/gpfs.clusterDisk.storagePool -A yes -B 512k -n 16

GPFS: 6027-531 The following disks of fs1 will be formatted on node lpar11:

nsd1: size 10485760 KB

nsd2: size 10485760 KB

nsd3: size 10485760 KB

nsd4: size 10485760 KB

GPFS: 6027-540 Formatting file system ...

GPFS: 6027-535 Disks up to size 103 GB can be added to storage pool 'system'.

GPFS: 6027-535 Disks up to size 103 GB can be added to storage pool 'pool1'.

Creating Inode File

Creating Allocation Maps

Creating Log Files

Clearing Inode Allocation Map

Clearing Block Allocation Map

Formatting Allocation Map for storage pool 'system'

Formatting Allocation Map for storage pool 'pool1'

GPFS: 6027-572 Completed creation of file system /dev/fs1.

mmcrfs: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmlsfs fs1

# mmmount /gpfs -a

# mmdf fs1

disk disk size failure holds holds free KB free KB

name in KB group metadata data in full blocks in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1 10485760 -1 yes yes 10427904 ( 99%) 976 ( 0%)

nsd2 10485760 -1 yes yes 10428416 ( 99%) 992 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20856320 ( 99%) 1968 ( 0%)

Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3 10485760 -1 no yes 10483200 (100%) 496 ( 0%)

nsd4 10485760 -1 no yes 10483200 (100%) 496 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20966400 (100%) 992 ( 0%)

============= ==================== ===================

(data) 41943040 41822720 (100%) 2960 ( 0%)

(metadata) 20971520 20856320 ( 99%) 1968 ( 0%)

============= ==================== ===================

(total) 41943040 41822720 (100%) 2960 ( 0%)

Inode Information

-----------------

Number of used inodes: 4022

Number of free inodes: 63562

Number of allocated inodes: 67584

Maximum number of inodes: 67584

+. create fileset

# mmcrfileset fs1 fileset1

Snapshot 'fileset1' created with id 1.

# mmcrfileset fs1 fileset2

Snapshot 'fileset2' created with id 2.

# mmcrfileset fs1 fileset3

Snapshot 'fileset3' created with id 3.

# mmcrfileset fs1 fileset4

Snapshot 'fileset4' created with id 4.

# mmcrfileset fs1 fileset5

Snapshot 'fileset5' created with id 5.

# mmlsfileset fs1

Filesets in file system 'fs1':

Name Status Path

root Linked /gpfs

fileset1 Unlinked --

fileset2 Unlinked --

fileset3 Unlinked --

fileset4 Unlinked --

fileset5 Unlinked --

# mmlinkfileset fs1 fileset1 -J /gpfs/fileset1

Fileset 'fileset1' linked at '/gpfs/fileset1'.

# mmlinkfileset fs1 fileset2 -J /gpfs/fileset2

Fileset 'fileset2' linked at '/gpfs/fileset2'.

# mmlinkfileset fs1 fileset3 -J /gpfs/fileset3

Fileset 'fileset3' linked at '/gpfs/fileset3'.

# mmlinkfileset fs1 fileset4 -J /gpfs/fileset4

Fileset 'fileset4' linked at '/gpfs/fileset4'.

# mmlinkfileset fs1 fileset5 -J /gpfs/fileset5

Fileset 'fileset5' linked at '/gpfs/fileset5'.

# mmlsfileset fs1

Filesets in file system 'fs1':

Name Status Path

root Linked /gpfs

fileset1 Linked /gpfs/fileset1

fileset2 Linked /gpfs/fileset2

fileset3 Linked /gpfs/fileset3

fileset4 Linked /gpfs/fileset4

fileset5 Linked /gpfs/fileset5

+. file placement policy

# cat /home/gpfs/placementpolicy.txt

/* The fileset does not matter, we want all .dat and .DAT files to go to pool1 */

RULE 'datfiles' SET POOL 'pool1' WHERE UPPER(name) like '%.DAT'

/* All non *.dat files placed in fileset5 will go to pool1 */

RULE 'fs5' SET POOL 'pool1' FOR FILESET ('fileset5')

/* Set a default rule that sends all files not meeting the other criteria to the system pool */

RULE 'default' set POOL 'system'

# mmchpolicy fs1 /home/gpfs/placementpolicy.txt

Validated policy `placementpolicy.txt': parsed 3 Placement Rules, 0 Restore Rules, 0 Migrate/Delete/Exclude Rules,

0 List Rules, 0 External Pool/List Rules

GPFS: 6027-799 Policy `placementpolicy.txt' installed and broadcast to all nodes.

# mmlspolicy fs1 -L

/* The fileset does not matter, we want all .dat and .DAT files to go to pool1 */

RULE 'datfiles' SET POOL 'pool1' WHERE UPPER(name) like '%.DAT'

/* All non *.dat files placed in fileset5 will go to pool1 */

RULE 'fs5' SET POOL 'pool1' FOR FILESET ('fileset5')

/* Set a default rule that sends all files not meeting the other criteria to the system pool */

RULE 'default' set POOL 'system'

+. placement policy test

>> 'dd if=/dev/zero of=/gpfs/fileset1/bigfile1 bs=64k count=1000' 수행 전후 결과를 mmdf fs1 으로 비교

>> system pool 에 file 이 입력됨 (default rule)

>> 'dd if=/dev/zero of=/gpfs/fileset1/bigfile1.dat bs=64k count=1000' 수행 전후 결과를 mmdf fs1 으로 비교

>> pool1 에 file 이 입력됨 (datfiles rule)

>> 'dd if=/dev/zero of=/gpfs/fileset5/bigfile2 bs=64k count=1000' 수행 전후 결과를 mmdf fs1 으로 비교

>> pool1 에 file 이 입력됨 (fs5 rule)

>> 'mmlsattr -L /gpfs/fileset5/bigfile2' 처럼 mmlsattr 명령으로도 확인 가능

# mmdf fs1

disk disk size failure holds holds free KB free KB

name in KB group metadata data in full blocks in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1 10485760 -1 yes yes 10427904 ( 99%) 976 ( 0%)

nsd2 10485760 -1 yes yes 10427392 ( 99%) 992 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20855296 ( 99%) 1968 ( 0%)

Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3 10485760 -1 no yes 10483200 (100%) 496 ( 0%)

nsd4 10485760 -1 no yes 10483200 (100%) 496 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20966400 (100%) 992 ( 0%)

============= ==================== ===================

(data) 41943040 41821696 (100%) 2960 ( 0%)

(metadata) 20971520 20855296 ( 99%) 1968 ( 0%)

============= ==================== ===================

(total) 41943040 41821696 (100%) 2960 ( 0%)

Inode Information

-----------------

Number of used inodes: 4027

Number of free inodes: 63557

Number of allocated inodes: 67584

Maximum number of inodes: 67584

# dd if=/dev/zero of=/gpfs/fileset1/bigfile1 bs=64k count=1000

1000+0 records in.

1000+0 records out.

# mmdf fs1

disk disk size failure holds holds free KB free KB

name in KB group metadata data in full blocks in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1 10485760 -1 yes yes 10395648 ( 99%) 1472 ( 0%)

nsd2 10485760 -1 yes yes 10395136 ( 99%) 992 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20790784 ( 99%) 2464 ( 0%)

Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3 10485760 -1 no yes 10483200 (100%) 496 ( 0%)

nsd4 10485760 -1 no yes 10483200 (100%) 496 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20966400 (100%) 992 ( 0%)

============= ==================== ===================

(data) 41943040 41757184 (100%) 3456 ( 0%)

(metadata) 20971520 20790784 ( 99%) 2464 ( 0%)

============= ==================== ===================

(total) 41943040 41757184 (100%) 3456 ( 0%)

Inode Information

-----------------

Number of used inodes: 4028

Number of free inodes: 63556

Number of allocated inodes: 67584

Maximum number of inodes: 67584

# dd if=/dev/zero of=/gpfs/fileset1/bigfile1.dat bs=64k count=1000

1000+0 records in.

1000+0 records out.

# mmdf fs1

disk disk size failure holds holds free KB free KB

name in KB group metadata data in full blocks in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1 10485760 -1 yes yes 10395648 ( 99%) 1472 ( 0%)

nsd2 10485760 -1 yes yes 10395136 ( 99%) 976 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20790784 ( 99%) 2448 ( 0%)

Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3 10485760 -1 no yes 10451456 (100%) 496 ( 0%)

nsd4 10485760 -1 no yes 10450944 (100%) 496 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20902400 (100%) 992 ( 0%)

============= ==================== ===================

(data) 41943040 41693184 ( 99%) 3440 ( 0%)

(metadata) 20971520 20790784 ( 99%) 2448 ( 0%)

============= ==================== ===================

(total) 41943040 41693184 ( 99%) 3440 ( 0%)

Inode Information

-----------------

Number of used inodes: 4029

Number of free inodes: 63555

Number of allocated inodes: 67584

Maximum number of inodes: 67584

# dd if=/dev/zero of=/gpfs/fileset5/bigfile2 bs=64k count=1000

1000+0 records in.

1000+0 records out.

lpar11:/home/gpfs# mmdf fs1

disk disk size failure holds holds free KB free KB

name in KB group metadata data in full blocks in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1 10485760 -1 yes yes 10395648 ( 99%) 1456 ( 0%)

nsd2 10485760 -1 yes yes 10395136 ( 99%) 976 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20790784 ( 99%) 2432 ( 0%)

Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3 10485760 -1 no yes 10419200 ( 99%) 496 ( 0%)

nsd4 10485760 -1 no yes 10419200 ( 99%) 496 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20838400 ( 99%) 992 ( 0%)

============= ==================== ===================

(data) 41943040 41629184 ( 99%) 3424 ( 0%)

(metadata) 20971520 20790784 ( 99%) 2432 ( 0%)

============= ==================== ===================

(total) 41943040 41629184 ( 99%) 3424 ( 0%)

Inode Information

-----------------

Number of used inodes: 4030

Number of free inodes: 63554

Number of allocated inodes: 67584

Maximum number of inodes: 67584

# mmlsattr -L /gpfs/fileset1/bigfile1

file name: /gpfs/fileset1/bigfile1

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: system

fileset name: fileset1

snapshot name:

creation Time: Tue Nov 27 11:28:29 2012

Windows attributes: ARCHIVE

# mmlsattr -L /gpfs/fileset1/bigfile1.dat

file name: /gpfs/fileset1/bigfile1.dat

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: pool1

fileset name: fileset1

snapshot name:

creation Time: Tue Nov 27 11:33:36 2012

Windows attributes: ARCHIVE

# mmlsattr -L /gpfs/fileset5/bigfile2

file name: /gpfs/fileset5/bigfile2

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: pool1

fileset name: fileset5

snapshot name:

creation Time: Tue Nov 27 11:35:55 2012

Windows attributes: ARCHIVE

# dd if=/dev/zero of=/gpfs/fileset3/bigfile3 bs=64k count=1000

1000+0 records in.

1000+0 records out.

# dd if=/dev/zero of=/gpfs/fileset4/bigfile4 bs=64k count=1000

1000+0 records in.

1000+0 records out.

+. file management with policy

# cat /home/gpfs/managementpolicy.txt

RULE 'datfiles' DELETE WHERE UPPER(name) like '%.DAT'

RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE UPPER(name) like 'BIG%'

# mmapplypolicy fs1 -P /home/gpfs/managementpolicy.txt -I test

>> 지정된 policy에 대해 Test 수행

[I] GPFS Current Data Pool Utilization in KB and %

pool1 133120 20971520 0.634766%

system 308736 20971520 1.472168%

[I] 4032 of 67584 inodes used: 5.965909%.

[I] Loaded policy rules from /home/gpfs/managementpolicy.txt.

Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2012-11-27@02:53:48 UTC

parsed 0 Placement Rules, 0 Restore Rules, 2 Migrate/Delete/Exclude Rules,

0 List Rules, 0 External Pool/List Rules

RULE 'datfiles' DELETE WHERE UPPER(name) like '%.DAT'

RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE UPPER(name) like 'BIG%'

[I]2012-11-27@02:53:49.218 Directory entries scanned: 11.

[I] Directories scan: 5 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@02:53:49.231 Sorting 11 file list records.

[I] Inodes scan: 5 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@02:53:49.303 Policy evaluation. 11 files scanned.

[I]2012-11-27@02:53:49.315 Sorting 5 candidate file list records.

[I]2012-11-27@02:53:49.323 Choosing candidate files. 5 records scanned.

[I] Summary of Rule Applicability and File Choices:

Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule

0 1 64000 1 64000 0 RULE 'datfiles' DELETE WHERE(.)

1 4 256000 3 192000 0 RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE(.)

[I] Filesystem objects with no applicable rules: 6.

[I] GPFS Policy Decisions and File Choice Totals:

Chose to migrate 192000KB: 3 of 4 candidates;

Chose to premigrate 0KB: 0 candidates;

Already co-managed 0KB: 0 candidates;

Chose to delete 64000KB: 1 of 1 candidates;

Chose to list 0KB: 0 of 0 candidates;

0KB of chosen data is illplaced or illreplicated;

Predicted Data Pool Utilization in KB and %:

pool1 261120 20971520 1.245117%

system 116736 20971520 0.556641%

# mmapplypolicy fs1 -P /home/gpfs/managementpolicy.txt

[I] GPFS Current Data Pool Utilization in KB and %

pool1 133120 20971520 0.634766%

system 308736 20971520 1.472168%

[I] 4032 of 67584 inodes used: 5.965909%.

[I] Loaded policy rules from /home/gpfs/managementpolicy.txt.

Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2012-11-27@02:54:46 UTC

parsed 0 Placement Rules, 0 Restore Rules, 2 Migrate/Delete/Exclude Rules,

0 List Rules, 0 External Pool/List Rules

RULE 'datfiles' DELETE WHERE UPPER(name) like '%.DAT'

RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE UPPER(name) like 'BIG%'

[I]2012-11-27@02:54:47.697 Directory entries scanned: 11.

[I] Directories scan: 5 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@02:54:47.708 Sorting 11 file list records.

[I] Inodes scan: 5 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@02:54:47.727 Policy evaluation. 11 files scanned.

[I]2012-11-27@02:54:47.759 Sorting 5 candidate file list records.

[I]2012-11-27@02:54:47.761 Choosing candidate files. 5 records scanned.

[I] Summary of Rule Applicability and File Choices:

Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule

0 1 64000 1 64000 0 RULE 'datfiles' DELETE WHERE(.)

1 4 256000 3 192000 0 RULE 'bigfiles' MIGRATE TO POOL 'pool1' WHERE(.)

[I] Filesystem objects with no applicable rules: 6.

[I] GPFS Policy Decisions and File Choice Totals:

Chose to migrate 192000KB: 3 of 4 candidates;

Chose to premigrate 0KB: 0 candidates;

Already co-managed 0KB: 0 candidates;

Chose to delete 64000KB: 1 of 1 candidates;

Chose to list 0KB: 0 of 0 candidates;

0KB of chosen data is illplaced or illreplicated;

Predicted Data Pool Utilization in KB and %:

pool1 261120 20971520 1.245117%

system 116736 20971520 0.556641%

[I]2012-11-27@02:54:50.399 Policy execution. 4 files dispatched.

[I] A total of 4 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;

0 'skipped' files and/or errors.

+. External Pool Management

# cat /home/gpfs/expool1.ksh

#!/usr/bin/ksh

dt=`date +%h%d%y-%H_%M_%S`

results=/tmp/FileReport_${dt}

echo one $1

if [[ $1 == 'MIGRATE' ]];then

echo Filelist

echo There are `cat $2 | wc -l ` files that match >> ${result}

cat $2 >> ${results}

echo ----

echo - The file list report has been placed in ${results}

echo ----

fi

# cat /home/gpfs/listrule1.txt

RULE EXTERNAL POOL 'externalpoolA' EXEC '/home/gpfs/expool1.ksh'

RULE 'MigToExt' MIGRATE TO POOL 'externalpoolA' WHERE FILE_SIZE > 2

# mmapplypolicy fs1 -P /home/gpfs/listrule1.txt

[I] GPFS Current Data Pool Utilization in KB and %

pool1 261120 20971520 1.245117%

system 116736 20971520 0.556641%

[I] 4031 of 67584 inodes used: 5.964429%.

[I] Loaded policy rules from /home/gpfs/listrule1.txt.

Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2012-11-27@04:09:22 UTC

parsed 0 Placement Rules, 0 Restore Rules, 1 Migrate/Delete/Exclude Rules,

0 List Rules, 1 External Pool/List Rules

RULE EXTERNAL POOL 'externalpoolA' EXEC '/home/gpfs/expool1.ksh'

RULE 'MigToExt' MIGRATE TO POOL 'externalpoolA' WHERE FILE_SIZE > 2

one TEST

[I]2012-11-27@04:09:23.436 Directory entries scanned: 10.

[I] Directories scan: 4 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@04:09:23.447 Sorting 10 file list records.

[I] Inodes scan: 4 files, 6 directories, 0 other objects, 0 'skipped' files and/or errors.

[I]2012-11-27@04:09:23.474 Policy evaluation. 10 files scanned.

[I]2012-11-27@04:09:23.501 Sorting 4 candidate file list records.

[I]2012-11-27@04:09:23.503 Choosing candidate files. 4 records scanned.

[I] Summary of Rule Applicability and File Choices:

Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule

0 4 256000 4 256000 0 RULE 'MigToExt' MIGRATE TO POOL 'externalpoolA' WHERE(.)

[I] Filesystem objects with no applicable rules: 6.

[I] GPFS Policy Decisions and File Choice Totals:

Chose to migrate 256000KB: 4 of 4 candidates;

Chose to premigrate 0KB: 0 candidates;

Already co-managed 0KB: 0 candidates;

Chose to delete 0KB: 0 of 0 candidates;

Chose to list 0KB: 0 of 0 candidates;

0KB of chosen data is illplaced or illreplicated;

Predicted Data Pool Utilization in KB and %:

pool1 5120 20971520 0.024414%

system 116736 20971520 0.556641%

one MIGRATE27@04:09:23.505 Policy execution. 0 files dispatched. \.......

Filelist

There are 4 files that match

----

- The file list report has been placed in /tmp/FileReport_Nov2712-04_09_23

----

[I]2012-11-27@04:09:23.531 Policy execution. 4 files dispatched.

[I] A total of 4 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;

0 'skipped' files and/or errors.

# more /tmp/FileReport_Nov2712-04_09_23

47621 65538 0 -- /gpfs/fileset1/bigfile1

47623 65538 0 -- /gpfs/fileset5/bigfile2

47624 65538 0 -- /gpfs/fileset3/bigfile3

47625 65538 0 -- /gpfs/fileset4/bigfile4

----------------------------------------------------

+. Replication (file 단위/filesystem 단위)

----------------------------------------------------

# mmlsfs fs1 -mrMR

>> Replication 정보를 확인. 만일 Replication 이 없을 경우는...

>> mmcrfs /gpfs fs1 -F pooldesc.txt -B 64k

flag value description

------------------- ------------------------ -----------------------------------

-m 1 Default number of metadata replicas

-r 1 Default number of data replicas

-M 2 Maximum number of metadata replicas

-R 2 Maximum number of data replicas

# mmlsdisk fs1

disk driver sector failure holds holds storage

name type size group metadata data status availability pool

------------ -------- ------ ------- -------- ----- ------------- ------------ ------------

nsd1 nsd 512 -1 yes yes ready up system

nsd2 nsd 512 -1 yes yes ready up system

nsd3 nsd 512 -1 no yes ready up pool1

nsd4 nsd 512 -1 no yes ready up pool1

# mmchdisk fs1 change -d "nsd1::::1::"

Verifying file system configuration information ...

mmchdisk: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmchdisk fs1 change -d "nsd2::::2::"

Verifying file system configuration information ...

mmchdisk: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmchdisk fs1 change -d "nsd3::::3::"

Verifying file system configuration information ...

mmchdisk: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmchdisk fs1 change -d "nsd4::::4::"

Verifying file system configuration information ...

mmchdisk: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmlsdisk fs1

disk driver sector failure holds holds storage

name type size group metadata data status availability pool

------------ -------- ------ ------- -------- ----- ------------- ------------ ------------

nsd1 nsd 512 1 yes yes ready up system

nsd2 nsd 512 2 yes yes ready up system

nsd3 nsd 512 3 no yes ready up pool1

nsd4 nsd 512 4 no yes ready up pool1

GPFS: 6027-740 Attention: Due to an earlier configuration change the file system

is no longer properly replicated.

# mmdf fs1

disk disk size failure holds holds free KB free KB

name in KB group metadata data in full blocks in fragments

--------------- ------------- -------- -------- ----- -------------------- -------------------

Disks in storage pool: system (Maximum disk size allowed is 96 GB)

nsd1 10485760 1 yes yes 10427392 ( 99%) 1440 ( 0%)

nsd2 10485760 2 yes yes 10427392 ( 99%) 976 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20854784 ( 99%) 2416 ( 0%)

Disks in storage pool: pool1 (Maximum disk size allowed is 96 GB)

nsd3 10485760 3 no yes 10356224 ( 99%) 496 ( 0%)

nsd4 10485760 4 no yes 10354176 ( 99%) 496 ( 0%)

------------- -------------------- -------------------

(pool total) 20971520 20710400 ( 99%) 992 ( 0%)

============= ==================== ===================

(data) 41943040 41565184 ( 99%) 3408 ( 0%)

(metadata) 20971520 20854784 ( 99%) 2416 ( 0%)

============= ==================== ===================

(total) 41943040 41565184 ( 99%) 3408 ( 0%)

Inode Information

-----------------

Number of used inodes: 4031

Number of free inodes: 63553

Number of allocated inodes: 67584

Maximum number of inodes: 67584

+. file 단위로 replication 하기

# dd if=/dev/zero of=/gpfs/fileset1/bigfile0 bs=64k count=1000

# mmlsattr -L /gpfs/fileset1/bigfile0

file name: /gpfs/fileset1/bigfile0

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: system

fileset name: fileset1

snapshot name:

creation Time: Tue Nov 27 13:29:47 2012

Windows attributes: ARCHIVE

# mmchattr -m 2 -r 2 /gpfs/fileset1/bigfile0

# mmlsattr -L /gpfs/fileset1/bigfile0

file name: /gpfs/fileset1/bigfile0

metadata replication: 2 max 2

data replication: 2 max 2

immutable: no

appendOnly: no

flags: unbalanced

storage pool name: system

fileset name: fileset1

snapshot name:

creation Time: Tue Nov 27 13:29:47 2012

Windows attributes: ARCHIVE

+. file system 단위로 replication 하기

# dd if=/dev/zero of=/gpfs/fileset1/bigfile1 bs=64k count=1000

# mmlsattr -L /gpfs/fileset1/bigfile1

file name: /gpfs/fileset1/bigfile1

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: pool1

fileset name: fileset1

snapshot name:

creation Time: Tue Nov 27 11:28:29 2012

Windows attributes: ARCHIVE

# mmchfs fs1 -m 2 -r 2

>> filesystem에 대한 replication 속성을 2로 변경

>> 변경이후로 생성되는 file은 replication이 2개로 바로 생성되나,

>> filesystem 변경이전의 file들은 mmrestripefs 를 해주어야만 replication이 반영됨

# mmlsattr -L /gpfs/fileset1/bigfile1

file name: /gpfs/fileset1/bigfile1

metadata replication: 1 max 2

data replication: 1 max 2

immutable: no

appendOnly: no

flags:

storage pool name: pool1

fileset name: fileset1

snapshot name:

creation Time: Tue Nov 27 11:28:29 2012

Windows attributes: ARCHIVE

# dd if=/dev/zero of=/gpfs/fileset1/bigfile2 bs=64k count=1000

# mmlsattr -L /gpfs/fileset1/bigfile2

file name: /gpfs/fileset1/bigfile2

metadata replication: 2 max 2

data replication: 2 max 2

immutable: no

appendOnly: no

flags:

storage pool name: system

fileset name: fileset1

snapshot name:

creation Time: Tue Nov 27 13:38:29 2012

Windows attributes: ARCHIVE

# mmrestripefs fs1 -R

>> filesystem 변경이전의 file들은 mmrestripefs 를 해주어야만 replication이 반영됨

GPFS: 6027-589 Scanning file system metadata, phase 1 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 2 ...

Scanning file system metadata for pool1 storage pool

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 3 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-589 Scanning file system metadata, phase 4 ...

GPFS: 6027-552 Scan completed successfully.

GPFS: 6027-565 Scanning user file metadata ...

100.00 % complete on Tue Nov 27 13:39:04 2012

GPFS: 6027-552 Scan completed successfully.

# mmlsattr -L /gpfs/fileset1/bigfile1

file name: /gpfs/fileset1/bigfile1

metadata replication: 2 max 2

data replication: 2 max 2

immutable: no

appendOnly: no

flags: unbalanced

storage pool name: pool1

fileset name: fileset1

snapshot name:

creation Time: Tue Nov 27 11:28:29 2012

Windows attributes: ARCHIVE

----------------------------------------------------

+. Snapshot

----------------------------------------------------

# echo "hello world:snap1" > /gpfs/fileset1/snapfile1

# mmcrsnapshot fs1 snap1

Writing dirty data to disk

Quiescing all file system operations

Writing dirty data to disk again

Resuming operations.

Checking fileset ...

# echo "hello world:snap2" >> /gpfs/fileset1/snapfile1

# mmcrsnapshot fs1 snap2

Writing dirty data to disk

Quiescing all file system operations

Writing dirty data to disk again

Resuming operations.

Checking fileset ...

# mmlssnapshot fs1

>> fs1 파일시스템에 생성된 snapshot

Snapshots in file system fs1:

Directory SnapId Status Created

snap1 1 Valid Tue Nov 27 13:43:56 2012

snap2 2 Valid Tue Nov 27 13:45:19 2012

# cat /gpfs/.snapshots/snap1/fileset1/snapfile1

# cat /gpfs/.snapshots/snap2/fileset1/snapfile1

>> snapshot data는 해당 filesystem 의 .snapshot 아래에 저장됨

# rm /gpfs/fileset1/snapfile1

# cp /gpfs/.snapshots/snap2/fileset1/snapfile1 /gpfs/fileset1/snapfile1

>> snapshot 복원

# mmdelsnapshot fs1 snap1

# mmdelsnapshot fs1 snap2

>> 저장된 snapshot 제거

# mmlssnapshot fs1

----------------------------------------------------

+. GPFS Multi-Cluster

> http://www.ibm.com/developerworks/systems/library/es-multiclustergpfs/

> 'All intercluster communication is handled by the GPFS daemon, which internally uses Secure Socket Layer (SSL).'

----------------------------------------------------

(cluster1-lpar11g) # mmauth genkey new

Generating RSA private key, 512 bit long modulus

.......++++++++++++

e is 65537 (0x10001)

writing RSA key

mmauth: Command successfully completed

(cluster1-lpar11g) # mmshutdown -a

(cluster1-lpar11g) # mmauth update . -l AUTHONLY

Verifying GPFS is stopped on all nodes ...

mmauth: Command successfully completed

(cluster1-lpar11g) # mmstartup -a

(cluster1-lpar11g) # rcp lpar11g:/var/mmfs/ssl/id_rsa.pub lpar12g:/tmp/lpar11g_id_rsa.pub

(cluster2-lpar12g) # mmauth genkey new

(cluster2-lpar12g) # mmshutdown -a

(cluster2-lpar12g) # mmauth update . -l AUTHONLY

(cluster2-lpar12g) # mmstartup -a

(cluster2-lpar12g) # rcp lpar12g:/var/mmfs/ssl/id_rsa.pub lpar11g:/tmp/lpar12g_id_rsa.pub

(cluster1-lpar11g) # mmauth add gpfs_cluster2.lpar12g -k /tmp/lpar12g_id_rsa.pub

>> gpfs_cluster2.lpar12g 와 같이 cluster의 node이름을 같이 지정해야 함

>> mmauth 로 생성된 id_rsa.pub 파일을 확인

mmauth: Command successfully completed

(cluster1-lpar11g) # mmauth grant gpfs_cluster2.lpar12g -f /dev/fs1

mmauth: Granting cluster gpfs_cluster2.lpar12g access to file system fs1:

access type rw; root credentials will not be remapped.

mmauth: Command successfully completed

(cluster2-lpar12g) # mmremotecluster add gpfs_cluster.lpar11g -n lpar11g,lpar21g -k /tmp/lpar11g_id_rsa.pub

>> "-n lpar11g,lpar21g" : gpfs_cluster.lpar11g 에 포함된 node list

mmremotecluster: Command successfully completed

mmremotecluster: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

(cluster2-lpar12g) # mmremotefs add remotefs -f fs1 -C gpfs_cluster.lpar11g -T /remotefs

>> cluster2 에서 gpfs_cluster.lpar11g 클러스터의 fs1을 file system에 추가

mmremotefs: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmchconfig opensslibname="/usr/lib/libssl.a(libssl64.so.0.9.8)" -N r07s6vlp1

(cluster2-lpar12g) # mmremotecluster show all

Cluster name: gpfs_cluster.lpar11g

Contact nodes: lpar11g,lpar21g

SHA digest: 7dcff72af5b5d2190ebe471e20bcfe8897d0e1cb

File systems: remotefs (fs1)

(cluster2-lpar12g) # mmremotefs show all

Local Name Remote Name Cluster name Mount Point Mount Options Automount Drive Priority

remotefs fs1 gpfs_cluster.lpar11g /remotefs rw no - 0

(cluster2-lpar12g) # mmmount remotefs

(cluster2-lpar12g) # mmdf remotefs

*. multicluster 구성시 꼬여서... gpfs cluster가 기동되지 않고, '6027-2114' 에러가 나는 경우...

>>> cipherList 를 reset하면 됨

# mmchconfig cipherList=""

# mmauth show all

Cluster name: gCluster5.lpar15 (this cluster)

Cipher list: (none specified)

SHA digest: (undefined)

File system access: (all rw)

----------------------------------------------------

+. GPFS Call-back method

----------------------------------------------------

# cat /home/gpfs/nodedown.sh

#!/bin/sh

echo "Logging a node leave event at: `date` " >> /home/gpfs/log/nodedown.log

echo "The event occurred on node:" $1 >> /home/gpfs/log/nodedown.log

echo "The quorum nodes are:" $2 >> /home/gpfs/log/nodedown.log

# rcp lpar11g:/home/gpfs/nodedown.sh lpar21g:/home/gpfs/

# rsh lpar21g chmod u+x /home/gpfs/nodedown.sh

# mmaddcallback NodeDownCallback --command /home/gpfs/nodedown.sh --event nodeLeave --parms %eventNode --parms %quorumNodes

mmaddcallback: 6027-1371 Propagating the cluster configuration data to all

affected nodes. This is an asynchronous process.

# mmlscallback

NodeDownCallback

command = /home/gpfs/nodedown.sh

event = nodeLeave

parms = %eventNode %quorumNodes

# mmshutdown -N lpar21g ; cat /home/gpfs/log/nodedown.log

저작자표시 비영리 변경금지 (새창열림)

'Cluster FileSystem' 카테고리의 다른 글

GPFS on developerworks #1 - Something isn't working, where do I start? (0)	2012.12.04
GPFS Tuning Parameters on developerworks #1 (0)	2012.12.04
GPFS FAQ on developerworks #1 (Mar 27, 2009) (0)	2012.12.04

Melting

,

'Cluster FileSystem'에 해당되는 글 4건

GPFS on developerworks #1 - Something isn't working, where do I start?

Contents

Common

A server failed and I had to rebuild it from scratch. How do I add it back into the cluster?

I just created a new file system with pools and when I try to write a file I receive a no space error?

I successfully created the NSD's but now GPFS does not see them

Something seems slow or appears to hang

Linux

GPFS fails to start and reports "no such file or directory" for libssl library in the mmfs.log file

The Kernel module will not build

Windows

How can I verify the connection to the Active Directory server is working from a windows node?

The GPFS installation fails (ERROR: encountered while installing driver package 'C:\Windows\SUA\usr\lpp\mmfs\driver\mmbus\mmbus.inf')

When attempting to add a windows node mmaddnode fails (A remote host refused an attempted connect operation.)

'Cluster FileSystem' 카테고리의 다른 글

GPFS Tuning Parameters on developerworks #1

Contents

leaseRecoveryWait

GPFSCmdPortRange

minMissedPingTimeout

maxMissedPingTimeout

maxReceiverThreads

pagepool

For Sequential IO

Random IO

Random Direct IO

NSD servers

32 Bit operating systems

opensslLibName

readReplicaPolicy

seqDiscardThreshold

sharedMemLimit

socketMaxListenConnections

socketRcvBufferSize

socketSndBufferSize

maxMBpS

maxFilesToCache

maxStatCache

maxBufferDescs

nfsPrefetchStrategy

nsdMaxWorkerThreads

numaMemoryInterleave

prefetchPct

prefetchThreads

Logfile

verbsLibName

verbsrdmasperconnection

verbsrdmaspernode

worker1Threads

worker3Threads

writebehindThreshold

'Cluster FileSystem' 카테고리의 다른 글

GPFS FAQ on developerworks #1 (Mar 27, 2009)

Questions & Answers

1. General questions

2. Software questions

3. Machine questions

4. Disk questions

5. Scaling questions

6. Configuration and tuning questions

7. Service questions

'Cluster FileSystem' 카테고리의 다른 글

GPFS Hands-on Guide

'Cluster FileSystem' 카테고리의 다른 글

공지사항

카테고리

태그목록

글 보관함

달력

링크

Melting

LATEST FROM OUR BLOG

LATEST COMMENTS

BLOG VISITORS

티스토리툴바