netisr kernel thread using 100% CPU, killing system ?

From: Bütow, Michael (michael.buetow@comsoft.de)
Date: Thu Jul 27 2006 - 09:57:44 EDT


Dear managers,

I am observing a strange problem where the netisr kernel thread goes from nearly no CPU to using nearly 100%.
At roughly the same time, the amount of wired pages on the system increases, first slowly (+100 pages/sec), then faster and faster (maybe 400-500 pages/sec).

Looking with vmstat shows that only the malloc pages are increasing - the rest of the wired pages remain stable.

The collect tool showed the network interface at 5% bandwidth utilisation (it's 10MB half-duplex, tu card), so it appears not that much.

The CPU was fully utilised, at first roughly 60% user and 40% system, but the system load increased slowly to 100% (netisr).

Does anybody have an explanation for the behaviour of the netisr thread and the seeming correlation to the increase in malloc pages ?

I would also appreciate any hints to the further diagnosis of the problem. So far we have used ps to identify the kernel thread.
We also twice forced a crash and analysed the kernel core file. In both cases we got:

(dbx) pd vm_page_free_count
0
(dbx) pd vm_perfsum.vpf_freepages
0

We tried the VM tuning recommended in the attachment v40d-tune.html of http://groups.google.de/group/fa.alpha-osf-managers/msg/6fc7d8d1ac927a1d . However, this did not fix the problem of netisr.

I can provide /etc/sysconfigtab if it helps - we have increased the TCP and UDP send and receive spaces, among others.

Looking forward to any suggestions,
Michael Bütow



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:30 EDT