Typed commands stall on my solaris 8 machines

From: Richard Skelton (Richard.Skelton@infineon.com)
Date: Fri Mar 19 2004 - 08:04:04 EST


Hi,
How can I find out what is causing commands to stall when users type
them.
This is an intermittent problem which I can sometimes reproduce:-
I have a directory with 956 files one of the files sol_crash is a link
to /net/fileserver4/vol/vol1/sol_crash
If I wait for the automounter to unmount /net/fileserver4 and type ls
-lt the listing will come back in less than one second or can take up to

24 seconds.
I have run a truss on the tcsh in which the ls command is trying to run
:-

16311: 25.7288 lstat64("./sol_crash", 0xFFBEE728) = 0
16311: 25.7291 readlink("./sol_crash",
"/net/brsfs04/vol/vol0/sol_crash", 4096) = 31
16311: stat64("./sol_crash", 0xFFBED690) (sleeping...)
16311: 43.3587 stat64("./sol_crash", 0xFFBED690) = 0

As you can see it's just sleeping from 25.7291 seconds to 43.3587
seconds after I startted the truss.

I can eventually reproduce this problem on both my new and old Citrix
servers.

The new Citrix servers a Solaris 8 Generic_108528-29 E3500's with 8GB
memory and 8*400 MHz CPU's plus the latest Recommended patches.
The old Citrix server were Generic_108528-21 Ultra 80's with 4GB memory
and 4*450 MHz CPU's
To try an improve performance we have always:-
      /usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 65535
      /usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 65535
      /usr/sbin/ndd -set /dev/udp udp_xmit_hiwat 65535
      /usr/sbin/ndd -set /dev/udp udp_recv_hiwat 65535
      /usr/sbin/ndd -set /dev/tcp tcp_max_buf 262144
      /usr/sbin/ndd -set /dev/tcp tcp_deferred_ack_interval 100
      /usr/sbin/ndd -set /dev/tcp tcp_dupack_fast_retransmit 2
      /usr/sbin/ndd -set /dev/tcp tcp_slow_start_initial 4
and added to /etc/system:-
set maxuprc=1024
set sq_max_size=200
*eCache Scrubbing
set ecache_scrub_enable = 1
set ecache_scan_rate=1000
set ecache_calls_a_sec=100
*End eCache Settings
set rlim_fd_max = 1024
set rlim_fd_cur = 256

They all connect to the NetApp filer via GigaBit Ethernet.
We are using NIS on the Sun servers and the Filers.
We have tried stopping nscd on the new servers and binding all the
machines to the master NIS server.
But still we see the problem.
When the server get slightly busy the problem gets worse.

-- 
Cheers
Richard Skelton
Richard.Skelton@infineon.com
Infineon Technologies UK Ltd
Infineon House
Great Western Court
Hunts Ground Road
Stoke Gifford
Bristol
BS32 8HP
Tel +44(0)117 9528808
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:28:19 EDT