system panic with message "Processor Machine Check"

From: milazzo (milazzo@michigan.cad.cea.fr)
Date: Wed Oct 01 2003 - 12:27:32 EDT


Hi,

Last night an ES40 system went down with this _panic_string :
0xfffffc0000655560 = "Processor Machine Check"
Looking in the "/var/adm/crash/crash-data.x" I saw the output below ...
It seems to be a problem with one of the CPU but after rebooting the server,
all went right without trouble ...
To my mind, it should be a problem with a corrupted memory but how can I
debug the EV6.8AL (21264B) error codes ?

Thanks in advance !

Jérôme

Machine Check Processor Fatal Abort
Machine check code = 0x100000098
        Ibox Status = 0000000000000000
        Dcache Status = 0000000000000000
        Cbox Address = 0000000001cb90c0
        Fill Syndrome 1 = 0000000000000000
        Fill Syndrome 0 = 0000000000000000
        Cbox Status = 0000000000000001
        EV6 captured status of Bcache mode = 0000000000000000
        EV6 Exception Address = fffffc00002aae90
        EV6 Interrupt Enablement and Current Processor mode =
0000007ee0000000
        EV6 Interrupt Summary Register = 0000000000000000
        EV6 TBmiss or Fault status = 0000000000000280
        EV6 PAL Base Address = 0000000000018000
        EV6 Ibox control = fffffffc1e304396
        EV6 Ibox Process_context = 00001c8000000004
        O/S Summary flag = 0000000000000004
        Cchip Base Address (phys) = 00000801a0000000
        Cchip Device Raw Interrupt Request = 0000000000000000
            DRIR Register Decode:
                PCI Device Interrupt Mask = 0000000000000000
        Cchip Miscellaneous Register = 0000000000000000
            Misc Register Decode:
                Cchip Revision: 00
                ID of CPU performing read: 00
        Pchip 0 Base Address (phys) = 0000080180000000
        Pchip 0 Error Register = 0000000000000000
            Pchip Error Register Decode:
                PCI Xaction Start Address = 0000000000000000
                PCI Command: Interrupt Acknowledge
        Pchip 1 Base Address (phys) = 0000080380000000
        Pchip 1 Error Register = 0000000000000000
            Pchip Error Register Decode:
                PCI Xaction Start Address = 0000000000000000
                PCI Command: Interrupt Acknowledge
CPU 2 is prevented from being rebooted.
The system must be reset or power cycled to clear this state.
panic (cpu 2): Processor Machine Check
syncing disks... device string for dump = SCSI 0 3 0 0 0 0 0.
DUMP.prom: dev SCSI 0 3 0 0 0 0 0, block 786432
device string for dump = SCSI 0 3 0 0 0 0 0.
DUMP.prom: dev SCSI 0 3 0 0 0 0 0, block 786432



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:37 EDT