Sun Fire V880 panics

From: Vladimir Terziev (vlady@gbservices.biz)
Date: Fri Jan 12 2007 - 12:02:03 EST


        Hi Managers,

        we have Sun Fire V880 running Solaris 8.

        Two months ago we experienced several kernel panics and reboots after a severe power spikes.
        Then i managed to record the follwoing error:

        ERROR: System "FATAL RESET" from DAR/DCS/MDR

        System State (CPU7 reporting)

        CPU0 Config/Control/Status registers:

        CPUVersion: 003e.0015.b100.0507
        SafConfig: 0caa.01bc.2000.0002
        ...

        Finaly the server stoped rebooting with report from the POST for failed tests for Motherboard/Centerplan and advice the same to be replaced.
        We then decided to order a new server base for V880 and today we have managed to replace the old Motherboard/Centerplan board and old IO board with the new ones.

        When we powered on the server, today, the POST has stopped with the following errors:

        3>Halting all other processors.
        3>ERROR: Unexpected Trap!
        3>H/W under test = Safari bus CPU 3, Motherboard/Centerplane
        3>END_ERROR

        and after soft reset:

        3>Start selftest...
        3>Waiting for slave CPU=0, timeout in 3 seconds...
        3>Waiting for slave CPU=0, timeout in 2 seconds...
        3>Waiting for slave CPU=0, timeout in 1 seconds...
        3>ERROR: TEST = Power on Reset Initialization
        3>H/W under test = CPU, Motherboard/Centerplan, I/O board, (system init)
        3>MSG = ERROR: Slave timeout waiting for 0 to finish, offlining cpu!
        3>END_ERROR

        3>INFO: Reset Module with CPUs 0 2, both have been offlined.
        3>ERROR: TEST = Power on Reset Initialization
        3>H/W under test = CPU, Motherboard/Centerplan, I/O board, (system init)
        3>MSG = ERROR: Slave timeout waiting for 1 to finish, offlining cpu!
        3>END_ERROR

        3>INFO: Reset Module with CPUs 1 3, both have been offlined.

        and the server got frozen completely.

        I had to reset it from the power switch and for my pleasure it passed all tests and booted the OS.

        After 5-6 mins of work the 2 months old problem raised again -- the kernel paniced itself and machine rebooted ...

        Could someone lead me to the possible roots of the real problem?

        Thanks in advance!

                Vladimir
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:41:27 EDT