Is this OS or Hardware?

From: Michael W (mikew@pvbb.net)
Date: Mon Aug 23 2004 - 12:15:39 EDT


We just put this ES40 into prod on saturday night and now it has shut itself
down 3 times since then. Does this look like software or hardware?

 WARNING: too many Processor corrected errors detected on cpu 0. Reporting
suspended.
WARNING: too many Processor corrected errors detected on cpu 1. Reporting
suspended.
WARNING: too many Processor corrected errors detected on cpu 2. Reporting
suspended.
WARNING: too many Processor corrected errors detected on cpu 3. Reporting
suspended.
Machine Check Processor Fatal Abort
Machine check code = 0x100000098
        Ibox Status = 0000000000000000
        Dcache Status = 000000000000001c
        Cbox Address = 000000002112b580
        Fill Syndrome 1 = 0000000000000000
        Fill Syndrome 0 = 00000000000000d3
        Cbox Status = 0000000000000003
        EV6 captured status of Bcache mode = 000000000000000d
        EV6 Exception Address = fffffc000066a298
        EV6 Interrupt Enablement and Current Processor mode =
0000007ee0000000
        EV6 Interrupt Summary Register = 0000000080000000
        EV6 TBmiss or Fault status = 0000000000000290
        EV6 PAL Base Address = 0000000000018000
        EV6 Ibox control = fffffe0007304396
        EV6 Ibox Process_context = 0000748000000004
        O/S Summary flag = 0000000000000004
        Cchip Base Address (phys) = 00000f01a0000000
        Cchip Device Raw Interrupt Request = 0000000000000000
            DRIR Register Decode:
                Machine Check SYSTEM Fatal Abort
Machine check code = 0x100000202
        Ibox Status = 0000000000000000
        Dcache Status = 0000000000000000
        Cbox Address = 0000000000000000
        Fill Syndrome 1 = 0000000000000000
        Fill Syndrome 0 = 0000000000000000
        Cbox Status = 0000000000000000
        EV6 captured status of Bcache mode = 0000000000000000
        EV6 Exception Address = fffffc00008cd140
        EV6 Interrupt Enablement and Current Processor mode =
00000062e0000000
        EV6 Interrupt Summary Register = 0000000200000000
        EV6 TBmiss or Fault status = 0000000000000000
        EV6 PAL Base Address = 0000000000018000
        EV6 Ibox control = fffffe000f304396
        EV6 Ibox Process_context = 0000000000000000
        O/S Summary flag = 0000000000000006
        Cchip Base Address (phys) = 00000f01a0000000
        Cchip Device Raw Interrupt Request = 2000000000000000
            DRIR Register Decode:
                Bit 61: Error from Pchip 1
                PCI Device Interrupt Mask = 0000000000000000
        Cchip Miscellaneous Register = 0000000800000030
            Misc Register Decode:
                Bit 4: Interval Timer Intr Pending to CPU 0
                Bit 5: Interval Timer Intr Pending to CPU 1
                Bit 35: CChip Rev (Bit<35>)
                Cchip Revision: 08
                ID of CPU performing read: 00
        Pchip 0 Base Address (phys) = 00000f0180000000
        Pchip 0 Error Register = 0000000000000000
            Pchip Error Register Decode:
                PCI Xaction Start Address = 0000000000000000
                PCI Command: Interrupt Acknowledge
        Pchip 1 Base Address (phys) = 00000f0380000000
        Pchip 1 Error Register = d300bd54f6200801
            Pchip Error Register Decode:
                Bit 0: Lost Error
                Bit 11: Correctable ECC Error
                System Address = 00000000bd54f620
                Command: DMA Read
                ECC Syndrome: d3
panic (cpu 0): System Uncorrectable Machine Check
Machine Check SYSTEM Fatal Abort
Machine check code = 0x100000202
        Ibox Status = 0000000000000000
        Dcache Status = 0000000000000000
        Cbox Address = 0000000000000000
        Fill Syndrome 1 = 0000000000000000
        Fill Syndrome 0 = 0000000000000000
        Cbox Status = 0000000000000000
        EV6 captured status of Bcache mode = 0000000000000000
        EV6 Exception Address = fffffc00006ae004
        EV6 Interrupt Enablement and Current Processor mode =
00000062e0000000
        EV6 Interrupt Summary Register = 0000000200000000
        EV6 TBmiss or Fault status = 0000000000000000
        EV6 PAL Base Address = 0000000000018000
        EV6 Ibox control = fffffe000f304396
        EV6 Ibox Process_context = 0000000000000000
        O/S Summary flag = 0000000000000006
        Cchip Base Address (phys) = 00000f01a0000000
        Cchip Device Raw Interrupt Request = 2000000000000000
            DRIR Register Decode:
                Bit 61: Error from Pchip 1
                PCI Device Interrupt Mask = 0000000000000000
        Cchip Miscellaneous Register = 0000000800000ff0
            Misc Register Decode:
                Bit 4: Interval Timer Intr Pending to CPU 0
                Bit 5: Interval Timer Intr Pending to CPU 1
                Bit 6: Interval Timer Intr Pending to CPU 2
                Bit 7: Interval Timer Intr Pending to CPU 3
                Bit 8: Interprocessor Intr Pending to CPU 0
                Bit 9: Interprocessor Intr Pending to CPU 1
                Bit 10: Interprocessor Intr Pending to CPU 2
                Bit 11: Interprocessor Intr Pending to CPU 3
                Bit 35: CChip Rev (Bit<35>)
                Cchip Revision: 08
                ID of CPU performing read: 00
        Pchip 0 Base Address (phys) = 00000f0180000000
        Pchip 0 Error Register = 0000000000000000
            Pchip Error Register Decode:
                PCI Xaction Start Address = 0000000000000000
                PCI Command: Interrupt Acknowledge
        Pchip 1 Base Address (phys) = 00000f0380000000
        Pchip 1 Error Register = d300bd54fd200801
            Pchip Error Register Decode:
                Bit 0: Lost Error
                Bit 11: Correctable ECC Error
                System Address = 00000000bd54fd20
                Command: DMA Read
                ECC Syndrome: d3

DUMP: blocks available: 1983962
DUMP: blocks wanted: 930642 (partial compressed dump) [OKAY]
DUMP: Device Disk Blocks Available
DUMP: ------ ---------------------
DUMP: 0x1300013 122678 - 1983959 (of 1983960) [primary swap]
DUMP.prom: Open: dev 0x5100001, block 786432: SCSI 1 3 0 3 300 0 0
DUMP: Writing header... [1024 bytes at dev 0x1300013, block 1983960]
esMP: Writing data..Machine Check Proc
  soErV F6 atCoalrr Aecbortt
lMea chDicneac chehe EckCC c Eodrre or= 0 x1on00 C00PU00 198

ta Ibox S
  tEusV6 C or= re00c0t00ab00le00 M00em00or00y 0
l Dlca chECe C StEarturos r on = C00PU00 100
Fi
000000001Cc_
cD DCR:bo x A dd re ss 00 00=
00000000000000000740e8057
 80
        FiCll_S SYNynDRdrOomMEe _11 : =
00000000000000000000000000000000

        Fill SCyn_SdrYNomDRe OM0 E_ 0: = 0
00000000000000000000000000d30
Cb
D
usox Stat
                        EV =6 0Co00r00r0e00c0t00ab00l0e03
ac EcVh6 e caECptC urEedr rostr atonus C oPUf B3c D
= he mode
  0E00V600 C00or00re00ct00a0b00le
        MEVe6m Eorxcy epFitillon EAdCCdr Eesrsr o r=
ffofnff Cc0PU00 306
abf8c
Pr CE_V6AD IDRnt:e rr up t En ab0l0em0en00t 0a0nd00 C00ur0r7en48t
0
  ocessor Cmo_deS =YN 0DR00OM00E0_621:e0 0 00 000000
u 00EV006 00In00te00rr 00
 pt SummaCry_S RYeNgiDRstOMerE_ 0=: 0 00 000000000080000000000000
0 EVD6 3TB 00
auss or F
  Elt Vst6 atCousrr e=c 0t0a00bl00e 00Dc00ac00h0e28 E0
C EVE6 rPArLo Bra seo nAd CdrPUes 2s C
0 = 000000
 00EV0061 80Co00rr
ec tEVa6 blIbe oxMe cmoonrytr oFl il l = ECffCf
ffEre0ro00r f3on04 C39PU6

2
        EV6 Ibox CPr_ocADesDRs_:c on te xt =
0000000000000000000000000074008

0
O/S SummCar_yS fYNlaDRg OM E_= 10:00 0 00 0000000000000000004
Ba C0ch00ip0 00
  se AddreCs_sSY (NDphROysME) _ 0= :00 0 00 0f0010a0000000000000
 D C0ch0Dip3 00
  evice Raw Interrupt Request = 0000000000000000
: DRIR Register Decode

E V P6C I CoDerrvicee ctInabtelerr uDptc aMchase k E=C 0C
00Er00ro00r 00o0n00 C00PU00 2
        C
e chip Misc
 llEVan6 eoCousrr Recegtiastbelr e =M e00mo00ry00 F00il000l00
E00C0
 D E r r Moris oc n ReCgPisU te2r
C
  ecode:
C _ CADchDRip: R ev i si on : 000000
r 0 I00D 00of1 CCPC0U C0pe
 forming Cre_SadY:N 0DR0
) EP_1ch: ip 00 0Ba00se0 0Ad00dr0e0s0s 0(0ph00ys0
                = 00000C_f0SY18ND00RO00ME00_00
r Pc0h0ip00 000 E00rr00or00 R00egDi3ste
                        = 0000000000000000
            Pchip Error Register Decode:
                PCI Xaction Start Address = 0000000000000000
                PCI Command: Interrupt Acknowledge
        Pchip 1 Base Address (phys) = 00000f0380000000
00 Pchip 1 Error Register = 000000
  E00V600 C00or00re
c t a b lPceh ipDc Earcroher EReCCgi Estrerr orDe ocon deCP:
3 U
ioI Xact
  En V6St Carort reAdctdrabeslse = M 0em00o0r00y 00F0i00ll00 E00CC0
E rPCroI r Coomnma CndPU: I3nt
errupt ACck_AnoDDwlR:ed ge

D UM P:0 0fi00rs00t 0c0ra00sh00 d76um8p0 f
00led: atCt_emSYptNDinROg MmEem_1or: y du00mp00..00.
00000000
C_SYNDROME_0: 00000000000000D3

EV6 Correctable Dcache ECC Error on CPU 2

EV6 CorDrUMeP:ct caobmplere Msseminorg y9 30Fi64ll0K BE iCCnt Eo r76ro30r
73on5K CB PUme 2mo
ry...
CDU_AMPDD: R S: ta r ti ng A d dr00es00s 00 00 00 E00nd7in4g80 A
Edress C S_SizYNe(DRMBOM)
D1:UMP : --00--00--00--00--00--00--0-0--00-
  -------C--_S--YN--DR--O-M--E_ -0:-- - -- 0--00
D00UM00P:0 00x00ff00ffD3fc
00081f1c0
o - E0xV6ff Cffofrc0re03ctffabfflfeef D 8ca94c.h0 e (iECndC icEratroorr )
D UCMPP:U 0 3xf
f5ffc01f
  cE00V600 C- o0rxfreffctffabc0le1f Mffeem3foerf y10 F.1il (li ndECicaCto
Er)rr
owc om0n: LCPinU k 3d
  n
C_ADDR: 00000000000070C0
C_SYNDROME_1: 0000000000000000
C_SYNDROME_0



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:06 EDT