Re: AW: System thrashing?

From: Jason delaFuente (jason.delafuente@GBE.COM)
Date: Wed Apr 07 2004 - 09:59:24 EDT


This isn't exactly true....

 "When looking at nmon I can see that the system has constantly high
I/O-waits (30 to 60 % of the CPU cycles), even when there is very little
activity on the system. Thus the system is performing rather bad. The usage
of paging space is about 1.5 GB."

Your high iowait isn't necessarily a problem. We have a large p690 that was experiencing the same issue. Our vmstat looked similar to yours. When we looked at the disk there weren't any real hot spots nor did our SAP BASIS team report any application performance issue. We ended up having many conversions with IBM and eventually a conference call with one of their AIX performance gurus and an SAP performance guru. They pretty much explained that because of the way iowait is reported on multi processor machines it is not necessarily an indication of an io issue by itself. In fact as the workload on our system has INCREASED our iowait time has actually decreased. This is due to the fact that more of the CPU's in the machine actually have work to do know so they don't get reported under the iowait column. To be honest it took some convincing from IBM before we accepted that this was not a problem but in the end it became obvious that it was not. There technical answer w!
 as much more detailed then what I have put in this email. They did give us a document called something like "demystifying iowait" that gave a very good explanation of why iowait is reported the way it is and why high iowait times by themselves on MP machines are not necessarily a problem. If you can get your hands on it I would recommend looking it over. I had a hard copy but I will see if I can get something that I can send you.

Jason de la Fuente

>>> Christian.Sippel@IZB.DE 04/07/04 08:31AM >>>
Hi Bill,
thanks for the answer, here is the output of vmstat 3 3|grep -v 0.0

kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
 r b avm fre re pi po fr sr cy in sy cs us sy id wa
 2 1 473321 326 0 25 25 271 985 0 1499 12363 13990 5 14 58 24
 1 1 473329 287 0 32 96 251 383 0 976 3903 2333 5 2 51 43
 0 1 473329 366 0 41 98 251 358 0 958 3759 2109 4 2 48 45

For which reason is "grep -v 0.0" ?

TIA
Christian
-----Ursprüngliche Nachricht-----
Von: Bill Verzal [mailto:BVerzal@KOMATSUNA.COM]
Gesendet: Mittwoch, 7. April 2004 14:53
An: aix-l@Princeton.EDU
Betreff: Re: System thrashing?

It depends on what the I/O waits are from.

Post output of 'vmstat 3 3|grep -v 0.0'
--------------------------------------------------------

"If everything is coming your way, then you are in the wrong lane"

Bill Verzal
AIX Administrator, Komatsu America
(847) 970-3726 - direct
(847) 970-4184 - fax

             "Sippel,
             Christian"
             <Christian.Sippel To
             @IZB.DE> aix-l@Princeton.EDU
             Sent by: IBM AIX cc
             Discussion List
             <aix-l@Princeton. Subject
             EDU> System thrashing?

             04/07/2004 03:33
             AM

             Please respond to
                  IBM AIX
              Discussion List
             <aix-l@Princeton.
                   EDU>

Dear List,

I have p630 with 2 GB of RAM and 2 GB of paging space mirrored on hdisk0
and
hdisk1. When looking at nmon I can see that the system has constantly high
I/O-waits (30 to 60 % of the CPU cycles), even when there is very little
activity on the system. Thus the system is performing rather bad. The usage
of paging space is about 1.5 GB. The machine is used as Tivoli Storage
Management Server, and among the dsmserv processes there is one that has a
size of 1140340 KB in memory.

root@o00tsmoe1:> ps aux | more
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND
root 647368 12.1 39.0 1140340 823928 - A Mar 18 6861:59
dsmserv -o /tsm

this are values shown by vmstat:

root@o00tsmoe1:> vmstat 5 20
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
 r b avm fre re pi po fr sr cy in sy cs us sy id wa
 2 1 470691 398 0 25 25 271 989 0 1501 12449 14101 5 14 58 24
 3 1 470697 816 0 54 0 301 518 0 1103 5947 4897 10 9 42 40
 0 1 470697 332 0 64 0 150 316 0 1130 6254 5291 11 10 42 38
 1 1 470697 219 0 69 0 302 744 0 1179 7027 5593 14 10 41 35
 3 1 470697 358 0 57 31 302 5552 0 1178 6564 5311 13 10 41 36
 3 1 470697 467 0 84 2 301 2601 0 1170 6663 5531 11 10 41 39
 4 1 470697 699 0 37 0 376 1727 0 1151 7167 5667 16 9 39 35
 3 1 470697 257 0 35 2 151 1372 0 1139 6526 5367 11 9 43 37
 1 1 470697 602 0 55 0 301 1519 0 1091 6029 4869 10 9 41 39
 0 1 470697 703 0 30 9 226 1691 0 1068 5879 4647 11 9 42 38
 0 1 470697 473 0 38 2 150 1589 0 1055 5568 4475 10 7 42 40
 1 1 470697 237 0 24 17 150 1529 0 1060 5712 4509 8 8 45 39
 3 1 470697 399 0 20 24 301 5430 0 1084 6067 4727 12 8 43 37
 1 1 470697 254 0 18 6 226 4073 0 1233 7163 5919 13 10 42 36
 1 1 470697 140 0 25 12 376 2204 0 1266 7556 5729 16 9 43 32
 2 1 470697 695 0 47 13 376 2337 0 1209 6684 5423 13 10 40 37
 1 1 470697 397 0 25 21 226 566 0 1282 7232 5839 13 10 44 34
 2 1 470697 370 0 27 5 226 727 0 1198 6820 5567 13 9 40 39
 1 1 470697 484 0 45 6 302 1215 0 1238 7103 5847 12 13 42 33
 0 1 470697 478 0 24 11 302 1090 0 1293 7729 6258 15 11 39 35

Is the system currently thrashing? Aren't the values of "po" to low for
thrashing? Would it help to put in another 2 GB of RAM into the box to
lower
the I/O-Waits and improve system performance?

Thanks in advance,
Christian



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:17:48 EDT