NFS File "Hang" On Solaris 8

From: Joshua Clark (jclark@hyperkitten.com)
Date: Wed Jan 26 2005 - 09:41:04 EST


Good Morning Sun Managers,

I'm having a serious problem with several of our Solaris NFS servers and
clients. From time to time certain files on NFS-mounted filesystems are
becoming, for lack of a better term, "locked". When a user process
attempts to access one of these files it hangs and cannot be terminated.
Eventually the process becomes defunct and only a reboot will clear the
process. I can reproduce the problem against a "locked" file with any
number of commands: cp, cat, more, vi, etc. Both binary and flat text
files have been affected. There seems to be no rhyme or reason as to which
files are affected.

Here is some background: We have two NFS servers, both of which are also
NFS clients of each other. A third machine, also an NFS client, is
affected. Two of the systems are SunFire 280R servers, the third is an
E-250. All three are running Solaris 8 with the latest Recommended Patch
Cluster installed (117350-18) as well as all of the latest NFS client and
server patches I could find. The 280's are using GigaSwift Ethernet
adapters, the E-250 is using an hme.

Below is the truncated truss output from the NFS client system when
attempting to access a locked file (called mobius.reg): I've attached only
the last 25 lines to show the point where the process actually hangs. This
command succeeds when run on the NFS server. I used the truss options
"truss -faeo /tmp/truss3.out -vall -wall -rall cat mobius.reg"

5375: open("/usr/lib/locale/en_US/en_US.so.2", O_RDONLY) = 3
5375: mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) =
0xFF360
000
5375: mmap(0xFF3A517C, 90112, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON, -
1, 0) = 0xFF260000
5375: mmap(0xFF260000, 15170, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED, 3, 0
) = 0xFF260000
5375: mmap(0xFF272000, 9134, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_F
IXED, 3, 8192) = 0xFF272000
5375: munmap(0xFF264000, 57344) = 0
5375: memcntl(0xFF260000, 7260, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
5375: close(3) = 0
5375: munmap(0xFF360000, 8192) = 0
5375: fstat64(1, 0xFFBEF058) = 0
5375: d=0x00800000 i=112080 m=0020620 l=1 u=1133 g=7
rdev=0x00600004
5375: at = Jan 26 09:42:49 EST 2005 [ 1106750569 ]
5375: mt = Jan 26 09:42:49 EST 2005 [ 1106750569 ]
5375: ct = Jan 26 08:46:54 EST 2005 [ 1106747214 ]
5375: bsz=8192 blks=0 fs=ufs
5375: open64("mobius.reg", O_RDONLY) = 3
5375: fstat64(3, 0xFFBEEFC0) = 0
5375: d=0x04240002 i=408784 m=0100664 l=1 u=1133 g=101 sz=2185
5375: at = Jan 26 09:34:28 EST 2005 [ 1106750068 ]
5375: mt = Jan 6 16:19:56 EST 2005 [ 1105046396 ]
5375: ct = Jan 16 07:13:40 EST 2005 [ 1105877620 ]
5375: bsz=8192 blks=16 fs=nfs
5375: llseek(3, 0, SEEK_CUR)

I've been working with Sun to attempt to resolve the problem, but have not
gotten anywhere so far. I'm hoping someone out there could shed some light
on this problem. Will summarize, etc.

Thanks in advance,

Joshua Clark
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:04 EDT