Strange problem with NFS

From: Udo Grabowski (udo.grabowski@imk.fzk.de)
Date: Wed Feb 05 2003 - 06:00:09 EST


Hello managers !

A strange and hard problem is hunting us for a couple of weeks now.
A 5.1A 8 member TruCluster has mounted a couple of home directories
from a Sun Solaris 8 Server. The mounts have been implemented memberwise,
as recommended in the best practices section: From /home there are
pointing several CDSLs to member specific user directories where the
mounts are done. This has worked perfectly for a couple of years.

For a while now we are running an application (IDL and Fortran) that
opens and closes a lot of files. After a couple of runs it's not
possible to open files on the NFS mounted home directories any more.
The only way the get back to normal is to reboot the (Alpha) machine.
Interestingly, it only occurs on one or two machines (but not always
the same ones) , but the other members in the same cluster then do not
have a problem accessing the same files.

Symptoms seem to hint to a problem with the CFS or the UBC, but we
still have no idea what's happening and where to look for. We have
already set the maximum open file parameters to several thousands,
but with no effect.

Any ideas what's happening here ? Has someone seen this before ?
Thanks for any help !

-- 
Dr. Udo Grabowski                           email: udo.grabowski@imk.fzk.de
Institut f. Meteorologie und Klimaforschung II, Forschungszentrum Karslruhe
Postfach 3640, D-76021 Karlsruhe, Germany           Tel: (+49) 7247 82-6026
http://www.fzk.de/imk/imk2/ame/grabowski/           Fax:         "    -6141


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:06 EDT