NFS client failing

From: Rob McMahon (Rob.McMahon@warwick.ac.uk)
Date: Tue Apr 04 2006 - 10:01:28 EDT


I asked this on the solarisx86 list without success, so I was wondering
if anyone here can help. Apologies for the rather long and rambling
post, I'm trying to include as much information as possible.

Has anybody seen anything like this, or clues on how to solve it. We

had a machine room power failure, after which everything came back,
except the NFS listserv service on a 20z running Solaris 10, talking to an
E3500 server running Solaris 8 fails with the following packet trace:

    listserv -> server TCP D=2049 S=54738 Syn Seq=358115738 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
      server -> listserv TCP D=54738 S=2049 Syn Ack=358115739
Seq=770867544 Len=0 Win=49640 Options=<nop,wscale 0,nop,nop,sackOK,mss 1460>
    listserv -> server TCP D=2049 S=54738 Rst Seq=358115739 Len=0 Win=0

Other TCP services seem fine. This listserv can talk to other servers
normally. The server shows no signs of having trouble with other
listservs, and it's the listserv reply that looks odd to me. I don't really
want to reboot the listserv if I can help it, since it's performing it's
duty normally (I just can't log in ...), and I'm not sure if it's going
to fix it. I am able to run commands on the machine as root. I did try
`svcadm restart nfs/listserv', because I could, but the behavior didn't
change. The listserv is:

SunOS listserv 5.10 Generic_118844-19 i86pc i386 i86pc

Any ideas how to debug this ? Natty dtrace scripts to try ?

I've noticed other spooky stuff as well, now:

    listserv -> server PORTMAP C GETPORT prog=100003 (NFS) vers=2 proto=UDP
      server -> listserv PORTMAP R GETPORT port=2049
    listserv -> server ICMP Destination unreachable (UDP port 51592 unreachable)

    listserv -> ntp0 NTP client (Tue Apr 4 14:29:05 2006)
        ntp0 -> listserv NTP server (Tue Apr 4 14:29:05 2006)
    listserv -> ntp0 ICMP Destination unreachable (UDP port 123 unreachable)

> ssh -x listserv lsof -Pp 387
...
xntpd 387 root 19u IPv4 0xffffffff82510700 0t0 UDP *:123 (Idle)
...
> ntpq -p listserv
ntpq: read: Connection refused
>

This is a multi-homed client, and if I use it's other interface:

> ntpq -p client
     remote refid st t when poll reach delay offset disp
==============================================================================
 NTP.MCAST.NET 0.0.0.0 16 u - 64 0 0.00 0.000 16000.0
 ntp0 ntp2.ja.net 2 - 252m 64 0 0.50 14.660 16000.0
+ntp1 ntp4 4 u 28 64 377 0.46 -8.451 4.94
+ntp2 ntp0 3 u 39 64 377 0.52 -8.294 4.04
*ntp3 ntp0 3 u 11 64 377 0.38 -4.908 1.63
>

ntp0 is the server, ntp[1-3] are on the same network. On `ntp1':

    listserv -> ntp1 NTP client (Tue Apr 4 14:57:48 2006)
        ntp1 -> listserv NTP server (Tue Apr 4 14:57:48 2006)

> ntpq -p listserv
     remote refid st t when poll reach delay offset disp
==============================================================================
 NTP.MCAST.NET 0.0.0.0 16 u - 64 0 0.00 0.000 16000.0
 ntp0 0.0.0.0 16 u - 64 0 0.00 0.000 16000.0
+ntp1 ntp0 3 u 57 64 77 0.53 -3.531 380.68
+ntp2 ntp1 4 u 57 64 77 0.55 -2.238 380.71
*ntp3 ntp0 3 u 57 64 77 0.27 -2.953 380.69
>

everything looks fine.

Any insights gratefully received.

Cheers,

Rob

-- 
E-Mail:	Rob.McMahon@warwick.ac.uk		PHONE:  +44 24 7652 3037
Rob McMahon, IT Services, Warwick University, Coventry, CV4 7AL, England
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:39:29 EDT