Re: AIX 5.1 + JFS + gaussian

From: Taylor, David (DTaylor@WBMI.COM)
Date: Thu Nov 06 2003 - 14:43:20 EST


>From the AIX V4 HACMP student notebook:

How to Avoid DMS Problems

Enable I/O pacing through SMIT. Calues of mid-thirties and mid-twenties
for High and Low watermarks have been found to be adequate for HACMP

Increase the frequency of the sync daemon. This will ensure that data
is flushed to disk more frequently but will have an adverse affect on
overall system performance.

Change the heartbeat rate of a NIM using SMIT. Do NOT make it faster,
this will make the problem worse as DMS value pertains to the SLOWEST
network declared to HACMP.

I also have a note indicating that the preferred fix is to change the
DMS value.

HTH

David

-----Original Message-----
From: IBM AIX Discussion List [mailto:aix-l@Princeton.EDU] On Behalf Of
Jan-Frode Myklebust
Sent: Thursday, November 06, 2003 1:23 PM
To: aix-l@Princeton.EDU
Subject: AIX 5.1 + JFS + gaussian

Hi,

I have 3 p690s running 64-bit AIX 5.1 with JFS on the /scratch
filesystem. When I run a few gaussian jobs on this /scratch
filesystem, I often get lots of these entries in the error log:

4FDB3BA1 1106120903 I S topsvcs DeadMan Switch (DMS) close to
trigger
3C81E43F 1106120903 P U topsvcs Late in sending heartbeat
864D2CE3 1106115903 P S topsvcs NIM thread blocked

My theory is that I get large amounts of data in the
buffer cache, and then the system gets very unresponsive
when AIX is trying to free this memory from the page cache.
I've often seen very high page-scan activity.

The probem is not related to lack if memory, the node has 192GB,
and probably about 100GB free during the last time time I saw this.

If I run the job on a GPFS filesystem the problem goes away,
but then of course we loose the page-cache benefit.

We run with the following vmtune setting:

        vmtune -R 64 -f 3840 -F 5888 -p 5 -P 10 -t 10 -y 1 -h 0

Has anybody seen the same problem? Does anybody have any
idea how to make this behave more nicely?

   -jf

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:17:20 EDT