Memory / CPU problems

From: bryan.mills@lynx.co.uk
Date: Wed Aug 27 2003 - 15:12:50 EDT


I'm trying to fathom out why our GS60 is grinding to a halt. I believe
it is application, but I need to prove it. I'm also not sure whether
it's a CPU or memory problem. I have just spent 2 hours trawling the
archives and am now more confused than ever!

It is 4 processors, with 8Gb memory running TRU64 5.1a. The application
uses 'Universe' database, has 650 users, and typically a user needs about
6 - 10Mb, a few users get a little higher,

What we are seeing is, after a time (2-3 days) the performance grinds to
a halt. We do have some jobs that periodically use 60-70% CPU, but they
have been like that for years. My two main concerns are,

In 'top', I never see any CPU idle time, and I feel that 'system' is
excessively high. At the start of the week after a reboot idle time was
typically 3-5% (And users were happy)

load averages: 13.28, 13.37, 13.24
                                    19:42:38
838 processes: 16 running, 1 waiting, 196 sleeping, 622 idle, 3 zombie
CPU states: 44.0% user, 0.0% nice, 55.9% system, 0.0% idle
Memory: Real: 6741M/8030M act/tot Virtual: 17359M use/tot Free: 32M

Frequently, top crashes with a memory fault.

'ps aux' shows an oddity on kernel idle, but the archives seem to suggest
that this isn't a problem.

root 0 4.2 3.5 10.2G 290M ?? R < Aug 23 03:29:37
[kernel idle]

'uptime' seems a little higher than most people experience. This was at
around 20% for each but most people have gone home now.

19:46 up 4 days, 7:31, 306 users, load average: 15.52, 14.93, 14.81

Swap seems good, in that we are not swapping, swapon -s shows,

Swap partition /dev/disk/dsk11c (default swap):
    Allocated space: 2221961 pages (16.95GB)
    In-use space: 1 pages ( 0%)
    Free space: 2221960 pages ( 99%)

Total swap allocation:
    Allocated space: 2221961 pages (16.95GB)
    Reserved space: 291467 pages ( 13%)
    In-use space: 1 pages ( 0%)
    Available space: 1930494 pages ( 86%)

sysconfigdb has

vm_swap_eager = 1

But I guess that as we are not swapping then that's not really an issue
anyway ?

One other symtom is that after a reboot the backup to an MDR fibre
channel DLT takes about 4 hours, when the machine get into this state
it's more like 10 hours +. I don't believe it's disk I/O, it's a fairly
new HSG80 Fibre SAN. Backup is done by creating and mounting a clone
fileset.

I'm at a loss to know where to look next and would appreciate some input
to try and help me identify this one.

Regards,

Bryan Mills.

This message is intended only for the use of the person(s) ("The intended
Recipient(s)") to whom it is addressed. It may contain information which
is privileged and confidential within the meaning of applicable law. If
you are not the intended recipient, please contact the sender as soon as
possible. The views expressed in this communication are not necessarily
those held by LYNX Express Limited.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:34 EDT