[HPADM] SUMMARY -- nfs hang due to stad or lockd?

From: Jeff Cleverley (jeff_cleverley@agilent.com)
Date: Fri Aug 26 2005 - 12:52:25 EDT


Greetings,

I'd like to thank the following for their replies and suggestions:

Marc Ahrendt
Prashant Zanwar
Jeff Lightner
Kevin O'Donovan
James J. Perry

Unfortunately, none of them fixed the issue. The most promising was
from Prashant that suggested using clear_locks (man 1m) and then
bouncing the lockd daemon. I tried this including bouncing the statd
also. While the processes were stopped, I removed the client file in
/var/statmon/sm, and then re-started the daemons. No luck.

It looks like the server will need a reboot at some point in time. For
the short term, the affected clients are compute servers. We're going
to rename them to something else. This should work unless the locks
somehow pick up any system information such as software id or mac address.

Thanks for all the help. The original post is listed below.

Jeff

Greetings,

We have a nfs server that has at least 50 systems mounting file systems
exported from it at any given time. We've found a few machines in the
lab that cannot do a "ll" of a couple of mount points. The same mount
points list properly from other machines as user root, so we know it's
not permission problems.

We have rebooted the client and even moved /etc/mnttab out of the way
before a reboot just to make sure it goes away and doesn't have any
corruption associated with it. It still won't list. It will list
sub-directories if you know what they are.

Because of this, we believe there is a caching issue on the server for a
couple of these mount points. There are directories under
/var/statmon/sm for the affected clients, along with all the unaffected
clients.

We're were thinking of killing and then res tarting the lockd and statd
processes. We were concerned about what this may do to existing mounts
for clients and also new mounts during this time. We've tried this on
some test boxes and it seems fine, but we don't have any way to generate
the number of requests the server gets, nor does the test box have any
corrupted cache that we can tell if it works.

Any information about what we need to do to clear this up? Rebooting
the server will be a really unpopular decision.

--
             ---> Please post QUESTIONS and SUMMARIES only!! <---
        To subscribe/unsubscribe to this list, contact majordomo@dutchworks.nl
       Name: hpux-admin@dutchworks.nl     Owner: owner-hpux-admin@dutchworks.nl
 
 Archives:  ftp.dutchworks.nl:/pub/digests/hpux-admin       (FTP, browse only)
            http://www.dutchworks.nl/htbin/hpsysadmin   (Web, browse & search)


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 11:02:49 EDT