NFS problem after clustering

From: Christopher Knorr (cknorr@trapsystems.com)
Date: Thu Apr 06 2006 - 12:25:03 EDT


We have a situation where we have a script that runs in Tru64 Unix that
waits to receive a data feed every day. The data is initially sent to a
Windows machine, and we have this machines' drive NFS-mounted on our
Tru64 Unix system. Therefore the cluster is serving as an NFS client.

In order to know if the feed has completed transmitting, the script has
some logic where it checks the file size and then waits 5 seconds. If
the file size does not change, it assumes the transmission was
completed.

This script has been working like a champ for YEARS on a single ES40.
Last weekend we upgraded this ES40 to a 4-node cluster. What we are
seeing now is that the script thinks the file download has completed,
but it really hasn't.

We can easily replicate the problem by doing an "ls -l" on the file
while it's downloading. It goes very long periods of time (20 seconds is
not unusual) before showing an actual change in the file size.
Interestingly, on any other node in the cluster we don't see this
behavior and the file size changes quite rapidly. At present, the same
node where we see the problem is also the node managing the CFS. (Show
by a cfsmgr -e nfs_mount command.)

Anyone have any ideas what might be going on?

Any thoughts or suggestions much appreciated!

regards,

Chris



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:29 EDT