Further questions concerning load average

From: Brewer, Edward (BREWERE@OD.NIH.GOV)
Date: Mon May 13 2002 - 16:09:15 EDT


Admins,

        I received two answers concerning my original question and I still
have questions.

From: alan@nabeth.cxo.cpqcorp.net

      I don't know where collect gets its run queue length. If
        the value is different than that displayed by uptime and
        Monitor, it may be displaying the raw number of processes
        in the run queue. The load average an exponential decay
        to the value, which smooths out sudden changes.

        My recollection is that there is a single queue of runnable
        processes, which is the basis for the load average numbers.
        Having more processors, simply offers more places for those
        to run, if a CPU becomes idle. Vastly oversimplied, the
        run queue can be view as the number of additional processors
        you could keep busy if you had them. This ignores the reality
        of I/O and other shared resource waits, etc.

From: david.ross@cantire.com

 could be wrong on this, but I believe that "uptime" shows the run queue
for all processors, so that a four processor system with a load average of
8.0 would have an average of two processes queued up per processor, in
addition to the ones currently running.

  The three values of load are the averages for the last five, thirty and
sixty seconds.

  At least that's the explaination I give to my manager when he demands to
know why our servers are so busy. It seems to keep him satisfied.

My question: If I have a 4 processors system (ES40 Tru64 5.1 patch 4) and
I see the load average across 5,30,60 seconds in the 1-3 range then am I
seeing an extra one to three processes per CPU that are not being executed
or do I see 1-3 process queued per CPU including the ones that are running
on CPU's. And if during that time I also see high idle time does this
possibly reflect a bottleneck. My real problem is that I am attempting to
help rectify a possible database slowness issue. The DBA's believe that the
SAN is having problems. Using monitor and collect I see no I/O bottlenecks,
but I do see the run averages approach 3. I believe that their is no disk
bottleneck because under monitor I see no I/O's listed as queued, but I
notice that the throughput is 1/3 that of a similar box, (actually the old
production box running 4.0f) As far as I can see the kernels are similar
and the machine hardware is the same. The DBA's ensure me that the
parameters are the same between the machines. I am pulling my hair out
here. If you need additional information I can provide it.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:40 EDT