SUMMARY: XP1000, V5.1 at 100% system time

From: Uwe Lienig (Uwe.Lienig@fif.mw.htw-dresden.de)
Date: Thu Feb 26 2004 - 03:57:43 EST


Greetings to all managers,

this list has proved its reliability again. Thanks to all who responded, namely
J Bacher, Alan Rollow and Dr. Blinn.

Problem
-------
The system in question ran at 100% system time, chewing up the whole CPU and
releasing only a little amount of CPU time to ordinary user programs.
Using vmstat the output showed a high context switch rate (at about 300k/s)
whereas the interrupt rate stayed low (at about 50-80/s). The vmstat output
showed the mystery, where only on user process (a simulation prog with a high
CPU demand) was running. But this process got only 25% of the CPU. The rest
(75%) was spent in system.

> vmstat 1
Virtual Memory Statistics: (pagesize = 8192)
   procs memory pages intr cpu
   r w u act free wire fault cow zero react pin pout in sy cs us sy id
   3 157 28 228K 11K 15K 4 16 18 0 28 0 15 82 242K 24 76 0
   3 157 28 228K 11K 15K 0 0 0 0 0 0 12 45 233K 26 74 0
   3 157 28 228K 11K 15K 0 0 0 0 0 0 15 53 240K 24 76 0
   3 157 28 228K 11K 15K 0 0 0 0 0 0 15 42 218K 31 69 0

Answers
-------
Alan pointed out that the WEBES director sometimes causes a high workload. Since
this system has no WEBES installed that couldn't be the case.

Dr. Tom asked me to use truss to look at the processes system calls if there
might be one process not clearly seen, that is doing a lot of system calls. I
used a script to check all running processes with truss. truss was instructed to
output a summary. But this as well named no guilty process. Dr. Tom then as well
was out of thoughts, what might impose such a high system workload to the CPU.

Resolution
----------
Since this problem ran out of answers it couldn't be cured in a correct manner
(means killing the guilty process or having touched an unknown but reproducable
kernel bug). The reason for having such a high cs rate while the sys calls don't
correspond to this remains hidden.
Since I hadn't logged off for about a month I first wanted to exit the CDE
session. This didn't work either - CDE didn't want to let me log off. I su'ed to
root and killed the Xserver. I got logged off but the high cs rate remained the
same. Interestingly there were a lot of defunct processes after killing the
Xserver. I got the login CDE screen again. Since the very problem didn't go away
I rebooted the system what returned the system to normal operation.

Since there is no test case available to bring the system into this state the
problem belongs to mysteries now and then seen on this list.

Tanks again to all

-- 
Uwe Lienig  | fon: (+49 351) 462 2780 | mailto:uwe.lienig@fif.mw.htw-dresden.de
             | fax: (+49 351) 462 3476 | http://www.fif.mw.htw-dresden.de
HTW Dresden | parcels: Gutzkowstr. 22 | letters: PF 12 07 01
    -FiF-    |          01069 Dresden  |          01008 Dresden


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:52 EDT