SUMMARY: mutex contention

From: Vahid Moghaddasi (sunman@ureach.com)
Date: Wed Aug 07 2002 - 22:31:11 EDT

Next message: Vahid Moghaddasi: "SSH 3.2 and password reset on Solaris"
Previous message: boby: "HP laserjet 1000 setup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Sorry for the late summary, many thanks to:
- Zeev Fisher for the continuous help
- Kevin Buterbaugh
- Abhilash
- Khoi Dinh

Most of the the above managers suggested
that there are
some sort of a heavy IO read over the NFS,
Kevin suggested
to increase ncsize and ufs_ninode parameters
which would
have probably fixed the problem.
Zeev asked the following outout:
netstat -na -P tcp|grep TIME_WAIT|wc -l
netstat -na -P tcp|grep FIN_WAIT_2 |wc -l
ndd /dev/tcp tcp_conn_hash|head
ndd /dev/tcp tcp_conn_hash|wc -l
netstat -k|egrep 'norcvbuf|nocanput'
and realized:
You have TCP drops ( tcpListenDrop value is
nonzero so
sometimes , incoming requests are refused )

You have probably the default values of
tcp_conn_hash and
tcp_close_wait_interval

Here is what we did which fixed the problem
to some extend
that we do not hear anything from dba's or
users:
We found out that the are a huge number of
files are being
exported via NFS, it appeared that NFS has
some sort of
limit on the number of inodes being
exported, most of the files
(about 20GB) were .nfs* files left from
previous reboots/crashes
etc,. Once we deleted the files, almost
immediately the system
was back to its "normal" level.
Thanks to all.
Vahid.

---- Vahid Moghaddasi sunman@ureach.com
wrote ----

> To: " Sun-Managers"
<sunmanagers@sunmanagers.org>
> From: Vahid Moghaddasi <sunman@ureach.com>
> Reply-To: <sunman@ureach.com>
> Subject: mutex contention
> Date: Wed, 31 Jul 2002 13:37:32 -0400
>
>
> Hi everyone,
>
> We have a "Serious mutex contention" as se
> toolkit puts it. The system is a 4500 4x4
> and exporting about 90GB of disk space as an
> nfs server to 4 other systems. Heavy
> compilation is done on the system all day.
> CPU Status from top shows 0% idle. Here is a
> first few lines of lockstat sleep 5:
> Count indv cuml rcnt spin
> Lock
> Caller
>
-------------------------------------------------------------------------------
> 210070 42% 42% 1.00 102
> vx_bc_bfree_lk
> vx_bc_brelse+0x7c
> 168414 33% 75% 1.00 90
> vx_bc_bfree_lk
> vx_bc_getblk+0x88
> 44338 9% 84% 1.00 51
> 0x662b4b08
> vx_bmap+0x2ec
> 29362 6% 90% 1.00 61
> 0x662b4b08
> vx_bmap+0x10
> 22342 4% 94% 1.00 113
> vx_bc_bfree_lk
> vx_bc_getblk+0x2ec
> 19893 4% 98% 1.00 74
> 0x662b4b08
> vx_bmap+0x1ec
> 9168 2% 100% 1.00 61
> 0x662b4b08
> vx_do_bmap_typed+0x220
> 23 0% 100% 1.00 161
> vx_sched_lk
> vx_worklist_thread+0x38
> 11 0% 100% 1.00 20
> vx_worklist_lk
> vx_worklist_process+0xb4
>
> outout of mpstat:
>
> CPU minf mjf xcal intr ithr csw icsw migr
> smtx srw syscl usr sys wt idl
> 0 12 0 561 4308 375 10868 3941 49
> 23084 0 969 42 58 0 0
> 1 1 1 259 4119 120 10849 3973 53
> 22867 0 2006 34 66 0 0
> 4 1 2 161 4600 289 11265 4305 57
> 22473 0 1252 42 58 0 0
> 5 26 1 850 4729 255 11494 4296 50
> 22453 0 845 40 60 0 0
>
>
> Here is some kernel parameters which might
> help!!!
>
> ncstats: 3372263
> ncsize: 69992
> ufs_ninode: 69992
> maxusers: 1024
> max_nprocs: 16394
> maxuprc: 16389
> ndquot: 26634
> nbuf: 800
>
> vmstat -s cache_ratio output:
> ...
> 9920068 total name lookups (cache hits 35%)
> ...
>
> Any help would be appreciated and as always
> I will summarize.
> VAhid.

---- Vahid Moghaddasi sunman@ureach.com
wrote ----

________________________________________________
Get your own "800" number
Voicemail, fax, email, and a lot more
http://www.ureach.com/reg/tag
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers

Next message: Vahid Moghaddasi: "SSH 3.2 and password reset on Solaris"
Previous message: boby: "HP laserjet 1000 setup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:24:43 EDT