Summary: System Performance

From: alan.nguyen@au.transport.bombardier.com
Date: Tue Aug 27 2002 - 23:32:54 EDT


Thank very much indeed for those who responded to my problem.
A special thank to Alan (alan@nabeth.cxo.cpqcorp.net) who gave me an absolutely
clear explaination.
I'm extremely impressed.

Please see my question at the end.

     On systems with a unified buffer cache (one that has access
     to all of free memory for caching file system data), I don't
     believe that it does much good to pay attention to the amount
     of free memory. Some file system I/O can quickly use up all
     of free memory, but if it is only read I/O, giving the memory
     back is very cheap. Page-outs are probably a better indicator
     of being out of memory. I don't think the system goes to any
     great lengths to page out data unless there is need.

     As for how much is too much, that depends on what you're
     willing to tolerate and how the system is used. The whole
     point of demand paged (virtual) memory is so that you can
     use more virtual memory than you have physical memory. You
     use real memory to lower the amount of paging to tolerable
     levels. If you can't tolerate any paging, limit the amount
     of memory the buffer cache can use and limit the amount of
     virtual memory allowed to however much physical memory you
     have left (by limiting page/swap space and policing users
     who figure out how to use mmap(2)).

     If your workload is one where you expect to be paging regardless
     of the amount of memory, then careful attention of the I/O design
     of the page/swap space is needed. Swapping probably prefers
     high sequential data rates in the page/swap space. Paging may
     prefer higher request rates, since it likely to be scattered
     around more. Short-stroked (*), high bandwith disks (*) on
     their own I/O subsystem may be the right solution in some cases.

     Finding bottlenecks in the I/O load can be hard. Oversimplified
     a bit, you want to find those I/O components that are working
     at close the their maximum sustained I/O rates. Whether that
     "rate" is the data rate or request rate depends on the I/O
     load. The problem with "maximum" I/O rate is you have to
     look at something realistic. The maximum theortical rate
     for a particular SCSI bus might simply be bus speed times
     data width; Wide (16 bit) UltraSCSI (20 Mhz), has a maximum
     of data rate of around 40 MB/sec. For short bursts, some
     device may be able to get that. However, the sustained data
     rate might be lower. The maximum is further complicated by
     the I/O load that simply can't drive the bus to saturation.

     I/O rates are also sensitive to whether the load stresses
     high request rate or high data rate. In a high request rate
     load, the I/O size may be small, and you'll never have to
     worry about saturating the data rate of the bus or devices.
     However, the devices may be at their maximum request rate,
     even if the bus seems to have plenty of head-room for higher
     request or data rates.

     Benchmarking the system with a variety of artificial, but
     similiar to the actual, I/O loads may offer a hint at what
     realistic maximums are. Once you know that you can compare
     actual loads to plusible maximums to see if you have a bottle
     neck.

     I don't recall reading anything about an 80% load rule. For
     systems with a single CPU, it is hard to buy a wide range
     of CPU performance because vendors tend to stop selling the
     slower CPUs when the faster ones are available in quantity.
     For systems that support multiple CPUs, a lot of idle CPU
     capacity isn't an economically good thing to have sitting
     around (well, it good for the vendor). Some spare capacity
     is good if your work-load has performance spikes. The
     amount to have spare depends on the work-load the affect
     of the spike has on response time for the other users. 80%
     usage leaves 20% idle, which may be a nice number for many
     work-loads. If you have a highly predictable workload
     that won't have high load spike, you can run closer to the
     maximum capacity.

     The tuning guide might offer some recommendations. If you
     don't have paper copies, PDF and HTML copies are on the
     documentation CDROM.

     (*) Short-stroked is a term that describes disks that use
     less capacity than their total. The presumption is that if
     you the limit the capacity you limit the seek range, and
     that helps seek performance.

     (*) Most, if not all, high capacity disks use zoned based
     records that puts more data on the out tracks (because of
     the long circumference) than on the inner tracks. Most
     such disks also have higher data rates on the outside
     tracks. If you were to dedicate disks to such an I/O
     that needed both characteristics and were willing to throw
     away part of the capacity, you'd want to limit the used
     space to the faster tracks.

My Question

Hi Tru64 Managers,

We have a number of servers that we'd like to monitor & adjust their performance
in CPU, Memory & I/O.
The servers are on UNIX OS 4.0F with latest patch. We use tools such as,
collect, Performance Manager. top, vmstat, iostat, xload, netstat, mrtg to
monitor and collect data.

What we'd like to know is what are the facts that we could judge that the
servers are lacking of Memory, I/O and CPU

For Memory, if the processes page out & free memory is low then this may
indicate the lack of memory, is it correct ? How do we determine/model how much
consitency of the pageout or how much time the system does the pageout, then we
could conclude that the system is lacking of memory. Can we say that whenever
the pageout occurs then the system is lacking of memory.
How we determine/model the swapout ? the ps command shows which process got
swapped out as a W indicates in (s) status column.

How do we determine the I/O got a problem ?

For CPU performance: is it correct that what a CPU is paid for is it MUST be
used for more than 80%

Anyone has any experience/opinion , please help.

Thanks.

Alan.Nguyen@au.transport.bombardier.com



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:50 EDT