100% busy on SAN based disk

From: Johan Hartzenberg (jhartzen@csc.com)
Date: Mon Sep 16 2002 - 08:44:43 EDT


Hi,

I have some 35 odd systems connected to a SAN for access to disk. Once in
a while one of these system goes into a state where it starts to respond
extremely slow.

iostat -xnc 5

Will show 100 under the %b column, and you get very high io-wait, avg
service times goes to 15 seconds and more, etc. Generally bad.

If I run
drvconfig
devlinks; disks
vxdctl enable

Then almost immediately the io will normalise and everything goes back to
normal. This has happened about 5 - maybe a few more times, in the past 3
weeks.

This is on Solaris 2.6 with current patches, using JNI HBAs and accessing
HDS 9960 based disk.

Also on these same san connected systems, I sometimes see something much
more severe. A system will go into a state where all commands which access
the disk (eg find /sandisk1) will hang up completely and can not be
interrupted or even killed. When this happens, the only way to get things
to work again is with uadmin, reboot doesn't work, nor does halt. After a
reboot everything seems perfectly normal! (So far this has happened 3
times, one time on each of 3 separate systems, all in the past 10 days)

If you have seen anything similar please let me know your findings!

Thanx,
  _Johan
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:24:56 EDT