IO-Wait of 99% - how to diagnose

From: extern.Tobias.Kronwitter@AUDI.DE
Date: Thu Dec 16 2004 - 05:27:23 EST


Hello all,

on a Solaris8 / SUN-Fire V440 (SunOS iuaw740 5.8 Generic_108528-29 sun4u
sparc SUNW,Sun-Fire-V440) we are experiencing a very high IO-Wait problem.
This Server is configured with Veritas vxvm 4.0 / mp1 and has SAN-Disks
connected via an Emulex 9002 FCA.

top reports the following:

load averages: 0.02, 0.01, 0.02
11:02:12
82 processes: 81 sleeping, 1 on cpu
CPU states: 0.0% idle, 0.0% user, 0.5% kernel, 99.5% iowait, 0.0% swap
Memory: 8192M real, 6375M free, 469M swap in use, 22G swap free

   PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
  5818 root 1 58 0 2344K 1440K cpu/0 0:00 0.05% top
   813 root 6 39 0 5240K 4440K sleep 4:56 0.04% picld
 29405 root 6 58 0 4728K 2816K sleep 0:05 0.01% elxdiscoveryd
 28964 root 1 48 0 2544K 2008K sleep 0:00 0.01% bash
  5813 root 1 38 0 6248K 2728K sleep 0:00 0.01% sshd
  1444 root 12 58 0 5368K 5080K sleep 0:31 0.00% mibiisa
 17990 dctm_run 3 58 0 40M 12M sleep 0:07 0.00% documentum
  5816 dctm_run 1 38 0 1392K 1144K sleep 0:00 0.00% sar
  5817 dctm_run 1 48 0 1456K 1128K sleep 0:00 0.00% sadc
  1471 root 1 58 0 0K 0K sleep 0:59 0.00% se.sparcv9.5.8
   980 root 5 58 0 4200K 2440K sleep 0:17 0.00% automountd
 10808 dctm_run 5 58 0 39M 22M sleep 0:09 0.00% documentum
    17 root 1 58 0 12M 10M sleep 0:07 0.00% vxconfigd
  3111 dctm_run 4 58 0 5672K 3728K sleep 0:07 0.00% dmdocbroker
 28215 dctm_run 1 2 0 1896K 1440K sleep 0:06 0.00% ksh

iostat doesn't indicate hi disk io:

bash-2.03# iostat 5 15
   tty sd0 sd1 sd2 sd3 cpu
 tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt
id
   8 35 0 0 0 100 4 9 100 4 10 4 1 9 1 1 25
73
 125 451 0 0 0 14 4 7 14 4 6 0 0 0 0 1 99
0
   0 16 0 0 0 0 0 5 0 0 6 0 0 0 0 1 99
0
   0 16 0 0 0 9 18 3 9 18 3 0 1 4 0 1 99
0
   0 16 0 0 0 67 26 23 61 26 26 2 1 6 0 0 100
0
   0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100
0
   0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100
0
   0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100
0
   0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 1 99
0
   0 16 0 0 0 4 8 3 4 8 3 0 0 4 0 1 98
0
  10 37 0 0 0 2 1 5 2 1 4 0 1 5 0 0 100
0
   0 16 0 0 0 4 3 23 21 5 19 0 0 0 0 0 100
0
  35 137 0 0 0 5 2 7 5 2 5 0 0 0 0 0 100
0
 126 421 0 0 0 14 4 5 14 4 6 0 0 0 0 1 99
0
   0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 1 99
0

the san disks are not under load either:

iostat -xnp
                    extended device statistics
    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
    1.5 2.7 50.7 50.5 0.0 0.0 0.0 8.8 0 2 c1t0d0
    0.2 0.0 0.1 0.0 0.0 0.0 0.0 0.1 0 0 c1t0d0s0
    0.0 0.1 0.0 0.4 0.0 0.0 0.0 6.2 0 0 c1t0d0s1
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t0d0s2
    1.2 2.7 50.5 50.0 0.0 0.0 0.0 9.7 0 2 c1t0d0s3
    0.2 0.0 0.2 0.0 0.0 0.0 0.0 0.9 0 0 c1t0d0s4
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0s5
    1.5 2.7 51.4 50.0 0.0 0.0 0.0 10.1 0 2 c1t1d0
    0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s1
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t1d0s2
    1.3 2.7 51.3 50.0 0.0 0.0 0.0 10.6 0 2 c1t1d0s3
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.2 0 0 c1t1d0s4
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s5
    0.6 0.5 2.5 1.9 0.0 0.0 0.0 9.1 0 0 c1t2d0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t2d0s2
    0.4 0.5 2.3 1.9 0.0 0.0 0.0 12.1 0 0 c1t2d0s3
    0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.2 0 0 c1t2d0s4
    0.4 0.5 1.5 1.9 0.0 0.0 0.0 9.9 0 0 c1t3d0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t3d0s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.2 0 0 c1t3d0s3
    0.2 0.5 1.4 1.9 0.0 0.0 0.0 12.8 0 0 c1t3d0s4
    0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.9 0 0 c3t30d0
    0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.9 0 0 c3t30d0s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d0s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t30d1
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t30d1s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d1s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0 0 c3t30d2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0 0 c3t30d2s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d2s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t30d3
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t30d3s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d3s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d4
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d4s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d4s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d5
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d5s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d5s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 0 c3t30d6
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 0 c3t30d6s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d6s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 0 c3t30d7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 0 c3t30d7s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d7s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d8
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d8s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d8s7
    0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t70d0
    0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t70d0s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d0s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0 0 c3t70d1
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0 0 c3t70d1s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d1s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t70d2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t70d2s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d2s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t70d3
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t70d3s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d3s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t70d4
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t70d4s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d4s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d5
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d5s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d5s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d6
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d6s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d6s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d7s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d7s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d8
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d8s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d8s7
    0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c4t31d0
    0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c4t31d0s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d0s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d1
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d1s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d1s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d2s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d2s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d3
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d3s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d3s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d4
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d4s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d4s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d5
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d5s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d5s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d6
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d6s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d6s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d7s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d7s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d8
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d8s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d8s7
    0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c4t71d0
    0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c4t71d0s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d0s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d1
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d1s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d1s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d2s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d2s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d3
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d3s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d3s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d4
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d4s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d4s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d5
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d5s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d5s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d6
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d6s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d6s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d7s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d7s7
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d8
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d8s2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d8s7

        --> c3t30d0, c3t70d0 are the same LUN viewed at via two
hba's => is one plex (mirror) of a volume
                c4t71d0, c4t71d0 are the same LUN viewed at via two
hba's => is the other plex of the same volume

It looks like, the disks aren't the problem.

Network looks ok also:

RAWIP
        rawipInDatagrams = 0 rawipInErrors = 0
        rawipInCksumErrs = 0 rawipOutDatagrams = 0
        rawipOutErrors = 0

UDP
        udpInDatagrams = 33591 udpInErrors = 0
        udpOutDatagrams = 33596 udpOutErrors = 0

TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400
        tcpRtoMax = 60000 tcpMaxConn = -1
        tcpActiveOpens = 13806 tcpPassiveOpens = 14845
        tcpAttemptFails = 9 tcpEstabResets = 749
        tcpCurrEstab = 15 tcpOutSegs =13312404
        tcpOutDataSegs =11237898 tcpOutDataBytes =63660618
        tcpRetransSegs = 422 tcpRetransBytes =336018
        tcpOutAck =2067205 tcpOutAckDelayed =1853406
        tcpOutUrg = 0 tcpOutWinUpdate = 15
        tcpOutWinProbe = 13 tcpOutControl = 58063
        tcpOutRsts = 1520 tcpOutFastRetrans = 85
        tcpInSegs =11835531
        tcpInAckSegs =10236185 tcpInAckBytes =63671992
        tcpInDupAck = 39885 tcpInAckUnsent = 0
        tcpInInorderSegs =9700449 tcpInInorderBytes =1566751826
        tcpInUnorderSegs = 1 tcpInUnorderBytes = 551
        tcpInDupSegs = 64 tcpInDupBytes = 4171
        tcpInPartDupSegs = 0 tcpInPartDupBytes = 0
        tcpInPastWinSegs = 0 tcpInPastWinBytes = 0
        tcpInWinProbe = 0 tcpInWinUpdate = 3
        tcpInClosed = 184 tcpRttNoUpdate = 347
        tcpRttUpdate =10222115 tcpTimRetrans = 1649
        tcpTimRetransDrop = 5 tcpTimKeepalive = 181
        tcpTimKeepaliveProbe= 16 tcpTimKeepaliveDrop = 1
        tcpListenDrop = 0 tcpListenDropQ0 = 0
        tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0

IPv4 ipForwarding = 2 ipDefaultTTL = 255
        ipInReceives =11593385 ipInHdrErrors = 0
        ipInAddrErrors = 0 ipInCksumErrs = 0
        ipForwDatagrams = 0 ipForwProhibits = 0
        ipInUnknownProtos = 0 ipInDiscards = 0
        ipInDelivers =11849241 ipOutRequests =13132669
        ipOutDiscards = 0 ipOutNoRoutes = 3
        ipReasmTimeout = 60 ipReasmReqds = 0
        ipReasmOKs = 0 ipReasmFails = 0
        ipReasmDuplicates = 0 ipReasmPartDups = 0
        ipFragOKs = 0 ipFragFails = 0
        ipFragCreates = 0 ipRoutingDiscards = 0
        tcpInErrs = 0 udpNoPorts = 4188
        udpInCksumErrs = 0 udpInOverflows = 0
        rawipInOverflows = 0 ipsecInSucceeded = 0
        ipsecInFailed = 0 ipInIPv6 = 0
        ipOutIPv6 = 0 ipOutSwitchIPv6 = 169

What else could be the reason ?
===============================

Who could we diagnose this problem ?
====================================

Thank you for your help.
Tobias
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:53 EDT