Summary: kernel idle %CPU high

From: Sdrolias Aristotelis (ASdrolias@cosmote.gr)
Date: Wed Sep 08 2004 - 09:49:37 EDT


Hi all,

Thanks to all of you, this is the info I got for "kernel idle" CPU processing time.
All replies on "kernel idle" process are summed up in a mail, I for from Dr. Thomas P. Blinn.
Look Bellow.

==================================================================================================
There are two questions here if you think about it.

1) What is "[kernel idle]" time anyway (that is, what's counted
as "[kernel idle]"), and

2) Why does this system have a high "[kernel idle]" time, and is
that in and of itself a cause of poor performance?

--
"[kernel idle]" is a catch-all in the Tru64 UNIX kernel.  It gets
all of the internal overhead kernel threads, including things like
sync-ing the disks, environmental monitoring, some aspects of disk
I/O, some memory management overhead, and so on.  Basically, it's
a "catch all" for the things the kernel is doing on behalf of the
system as a whole that can't be blamed on any specific user "job"
or process.  (If the kernel could clearly identify a specific user
process as "responsible" for the kernel work, it would charge the
CPU time to that process.)  Some of this CPU work may be due to
things like SCSI I/O interfaces, things like interrupt handling
are NOT usually attributable to any specific user process, so in
most cases, they are charged off to the "[kernel idle]" bucket.
In addition, after the kernel has done ALL the work that's been
thrown at it by the hardware and the users, if there is any CPU
time left over (real CPU idle time), that gets charged into the
"[kernel idle]" bucket.
--
Now, why is this system performing poorly with high "[kernel idle]"?
Most probably, it's due to application design or implementation, or
poor choice of things like I/O hardware.  It sounds like you have a
LOT of "general system overhead" in this system.  Depending on the
hardware configuration, this could be due to things like the choice
of SCSI adapters (some do more of the work in the adapter itself and
have a really simple kernel interface, some require the Alpha CPU to
do a lot of work to service I/Os), the network interfaces, and the
like, and some may be due to the way the applications use some of the
interfaces in the system.  Things like selecting the next thread to
run when application threads block (scheduling) are charged to the
"[kernel idle]" bucket, for example, so in a system where there is
a LOT of process context switching there will be high "[kernel idle]"
reported.
--
Bottom line:  It sure sounds like you have system performance problem,
and the high "[kernel idle]" time reported is a symptom, but it's not
likely to be the cause; you just have to trust me that if the kernel
could assign the available CPUs to doing application work, it would
do so.  If the CPUs seem idle when there is application work to be
done, there is some bottleneck in the system that's causing this, but
the high "[kernel idle]" is just a symptom, not a cause.
Tom
====================================================================================================
USEFULL HINTS AFTER DISCUSION ON THE SUBJECT. 
(A)
It is usefull to take a look on the threads that they are occupying most of the CPU time.
Using ps -Am -O THREAD -p <kernel_idle_PID> a list of THREADS is listed with appropriate information.
(B)
Most of the managers have experienced this problem when I/O  bottleneck was occuring.
For discovering I/O bottlenecks and hints I got the following reply from Michael James Bradford:
"My guess is that the problem lies with your disk I/O. If a process is waiting for data from the disks and therefore is unable to do anything, then it will sleep and the CPUs be idle.
To analyze your machines performance, run "collect". You can either run it "live" or with output to a file (the latter is probably best). For the disks, look for high AVS, AVW, ACTQ and WTQ (explanations of these can be found in the collect man page). Depending on the values of these, try to spread the load over more spindles (disks) or more controllers. Try analyzing what Oracle is doing as well as you could have inefficient sql calls."
(C)
Also for seeing the iowait of CPUs vmstat -w should be used as a first step. 
Kind Regards, 
Aristotle Sdrolias.
>  -----Original Message-----
> From: 	Sdrolias Aristotelis  
> Sent:	Wednesday, July 28, 2004 6:31 PM
> To:	'tru64-unix-managers@ornl.gov'
> Subject:	kernel idle %CPU high
> 
> Hi all, 
> 
> OS: tru64 5.1 
> 
> Main question is the following:
> Is it possible high %CPU time of [kernel idle] process to slow down the execution of other processes and generally result to poor performance of the system?
> 
> Situation:
> We are experiensing here poor performance on some processes which are more sleeping than executing on CPUs when the number of CPUs are enough (and  70% idle), and memory is enough. Processes are connecting, retreiving data from oracle, computing and writing back to oracle and filesystem.
> So, there is a possibility that oracle or filesystem might be the bottleneck. 
> But appart from this, is it possible high %CPU time of [kernel idle] process to slow down the execution of processes?
> [kernel idle] process has on our system %CPU time 65% on average.
> Is there a possibility that this is causing the problem?. If yes, how can I find what is the exact cause of the problem?
> I have seen many tru64 system and this is the only one with such a big CPU time on [kernel idle] process.
> What might be causing this behaviour? any ideas?
> 
> 
> Kind regards, 
> Aristotle Sdrolias
> Software & system Engineer.
> Email: asdrolias@cosmote.gr
> 
> 


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:07 EDT