High cpu on main 15K system controller

From: De Smaele Kim (Kim.DeSmaele@bpo.be)
Date: Tue Sep 18 2007 - 08:38:53 EDT


Dears,

I'm having serious issues with a 15K main system controller. For some
reason, there is a tip process taking all the cpu load.
This morning and a few days before, I also noticed the hwad polling
daemon was constantly (for a long time) using 50 to 80% cpu time. Now,
only 2%.
The tip process is a process owned by sms-svc (at this time already
running in kernel space):

Regarding the tip hardwire, there is no smsconnectsc process to the
spare. On the spare, the console is not logged in on the serial port (no
console tty found while running who). There are currently no domain
consoles open.

I already did a reset of the spare system controller, from on the main
using resetsc, but no positive results. At this time, I think it's safe
to keep the failover enabled.

I found some document on sunsolve about high cpu usage on sunsolve, but
releated to fomd. Fomd seems to be running fine.

No error platform error messages, besides of the components which are
powered off.

Also Sun is trying to figure out what's going on, but no answer yet.
I also asked Sun the question if it would be harmless to run a truss on
an SC sms process (on the tip process). I certainly want to avoid dstops
on my production domains.

Advise or help would be very welcome on this (urgent issue).

Thanks in advance,
Kim DS.

Some output for you:

---------------------------------------------------------------------
sc1a:sms-svc:14> ps -ef|grep tip
 sms-svc 12496 12495 0 Sep 16 ? 0:00 tip hardwire
 sms-svc 12495 1 92 Sep 16 ? 2183:32 tip hardwire
 sms-svc 11749 20827 0 14:21:27 pts/4 0:00 grep tip
sc1a:sms-svc:15>

---------------------------------------------------------------------

   PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
 12495 sms-svc 1064K 944K run 10 0 36:24.18 90% tip/1
   564 root 16M 11M sleep 59 -10 0:00.01 3.1% hwad/62
   627 root 13M 9488K sleep 59 -5 0:00.01 0.8% fomd/46
 11892 sms-svc 1696K 1416K cpu0 59 0 0:00.00 0.3% prstat/1
 26176 sms-esmd 10M 6872K sleep 59 -5 0:00.00 0.3% esmd/44
  7893 root 1712K 1432K sleep 59 0 0:00.02 0.1% prstat/1
   555 root 1104K 928K sleep 59 0 0:03.54 0.1% sh/1
     1 root 880K 664K sleep 59 0 0:01.20 0.1% init/1

---------------------------------------------------------------------

sc1a:sms-svc:16> showfailover -v
SC Failover Status: ACTIVE
Status of Shared Memory:
    HASRAM (CSB at CS0): .......................................Good
    HASRAM (CSB at CS1): .......................................Good

Status of sc1a:
    Role: .......................................MAIN
    SMS Daemons: .......................................Good
    System Clock: .......................................Good
    Private I2 Network: .......................................Good
    Private HASRAM Network: .......................................Good
    Public Network:
            Group "C1": .........................................Up
          hme0: .........................................Up
          eri1: .........................................Up
            Logical IP Addr. -
C1:.........................................Up
    System Memory:
.......................................11.1%
    Disk Status:
        /:
........................................2.6%
    Console Bus Status:
        EXB at EX0: .......................................Good
        EXB at EX1: .......................................Good
        EXB at EX2: .......................................Good
        EXB at EX3: .......................................Good
        EXB at EX4: .......................................Good
        EXB at EX5: .......................................Good
        EXB at EX6: .......................................Good
        EXB at EX9: .......................................Good
        EXB at EX10: .......................................Good
        EXB at EX11: .......................................Good
        EXB at EX12: .......................................Good
        EXB at EX14: .......................................Good

Status of sc0a:
    Role: ......................................SPARE
    SMS Daemons: .......................................Good
    System Clock: .......................................Good
    Private I2 Network: .......................................Good
    Private HASRAM Network: .......................................Good
    Public Network:
            Group "C1": .........................................Up
          hme0: .........................................Up
          eri1: .........................................Up
            Logical IP Addr. -
C1:...................................Inactive
    System Memory:
.......................................10.1%
    Disk Status:
        /:
........................................4.1%
    Console Bus Status:
        EXB at EX0: .......................................Good
        EXB at EX1: .......................................Good
        EXB at EX2: .......................................Good
        EXB at EX3: .......................................Good
        EXB at EX4: .......................................Good
        EXB at EX5: .......................................Good
        EXB at EX6: .......................................Good
        EXB at EX9: .......................................Good
        EXB at EX10: .......................................Good
        EXB at EX11: .......................................Good
        EXB at EX12: .......................................Good
        EXB at EX14: .......................................Good

sc1a:sms-svc:17>
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:42:20 EDT