Re: Phantom processes hogging CPUs

From: mrbean@mira.net
Date: Tue May 03 2005 - 18:54:56 EDT


Greetings,
          thanks for the responses to my previous email regarding
some "mysterious" processes hogging a couple of CPUs on a v880.

mpstat

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
  0 0 0 1 12 10 10 0 0 0 0 17 0 0 0 100
  1 0 0 8 4 1 9 0 0 0 0 3 0 0 0 100
  2 0 0 1 4 1 77 0 0 0 0 6 0 0 0 100
  3 5 0 15 7 4 25 0 0 1 0 16 0 0 0 100
  4 0 0 1 4 1 0 0 0 130290 0 0 0 100 0 0
  5 0 0 1 4 1 2 0 0 130290 0 0 0 100 0 0
  6 0 0 1 4 1 10 0 0 0 0 15 0 0 0 100
  7 0 0 2 222 120 5 0 0 0 0 4 0 0 0 100

Unfortunately I had bounced the server before I received the helpful
suggestions of running lockstat to try and identify the kernel process that may
be causing the problem. Note the server is in a very secure environment and
built from a corporate image, so issues of hacked bins could be reasonably
discounted.

A week later the problem has returned... yay... I think :-)

Running 'lockstat -kgIW sleep 5' produced...

Profiling interrupt: 3879 events in 4.998 seconds (776 events/sec)

Count genr cuml rcnt nsec Hottest CPU+PIL Caller
-------------------------------------------------------------------------------
 4244 109% ---- 1.00 819 cpu[2]+11 disp_getwork
 3879 100% ---- 1.00 745 cpu[4] lockstat_intr
 3879 100% ---- 1.00 745 cpu[4] cyclic_fire
 3879 100% ---- 1.00 745 cpu[4] cbe_level14
 3879 100% ---- 1.00 745 cpu[4] current_thread
 3340 86% ---- 1.00 816 cpu[2]+11 idle
 1042 27% ---- 1.00 499 cpu[5] ce_start
  971 25% ---- 1.00 529 cpu[4] runservice
  968 25% ---- 1.00 529 cpu[4] taskq_d_thread
  968 25% ---- 1.00 529 cpu[4] stream_service
  948 24% ---- 1.00 526 cpu[4] ce_wsrv
  387 10% ---- 1.00 606 cpu[4] mutex_vector_enter
...

and

lockstat sleep 5

Adaptive mutex spin: 1214655 events in 4.995 seconds (243197 events/sec)

Count indv cuml rcnt spin Lock Caller
-------------------------------------------------------------------------------
1214655 100% 100% 1.00 47 0x30000467818 ce_start+0x294
-------------------------------------------------------------------------------

Adaptive mutex block: 4 events in 4.995 seconds (1 events/sec)

Count indv cuml rcnt nsec Lock Caller
-------------------------------------------------------------------------------
    4 100% 100% 1.00 14825 0x30000467818 ce_start+0x294
-------------------------------------------------------------------------------

Spin lock spin: 239 events in 4.995 seconds (48 events/sec)

Count indv cuml rcnt spin Lock Caller
-------------------------------------------------------------------------------
  137 57% 57% 1.00 19 cpu[2]+0x90 disp+0xa4
   61 26% 83% 1.00 43 cpu[3]+0x90 disp+0xa4
   11 5% 87% 1.00 1146 cpu[3]+0x90 disp_getbest+0x4
    9 4% 91% 1.00 19 cpu[0]+0x90 disp+0xa4
    8 3% 95% 1.00 1661 cpu[6]+0x90 disp_getbest+0x4
    4 2% 96% 1.00 85 cpu[5]+0x90 disp+0xa4
    4 2% 98% 1.00 752 cpu[6]+0x90 disp+0xa4
    3 1% 99% 1.00 924 cpu[2]+0x90 disp_getbest+0x4
    1 0% 100% 1.00 72 turnstile_table+0xc28 turnstile_lookup+0x4c
    1 0% 100% 1.00 1 cpu[7]+0x90 disp+0xa4

All this reference to ce_start indicates? this is related to the ce network
interface which currently is in a FAILED status in a multipathed configuration.

I note the installed ce driver is circa Feb/2004 and there have been a number
of revisions since then, so it may be prudent to look at updates in this area.

Any other thoughts on this would be muchly appreciated.

Cheers

Neill Griffin
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:38 EDT