Checkstop analysis .

From: Mark Malins (mark.malins@SCOTTISH-SOUTHERN.CO.UK)
Date: Wed Aug 21 2002 - 06:26:21 EDT


Hi all ,
We have had a hardware crash on one of our systems , below is a a output from a
checkstop analysis ,it looks like a memory or planar issue , but does
anyone know how to be more specific with a diagnosis ? , any input appreciated
. The box in question is a J50 running Aix 4.3.3 ML 8 .
Best Regards,
Mark .

SMP Checkstop Interpretation. Version 8.5
-------------------------------------------------------------------------------
File checkstop.00059487A100.A
-------------------------------------------------------------------------------

Action Plans

   There is no checker on the system claiming to have pulled the checkstop line.

   Your support center may be needed to help you with this checkstop.
-------------------------------------------------------------------------------

SRN/FRU Information

As a part of diagnostics, smpcheck would not have made a FRU callout in this
case.
-------------------------------------------------------------------------------

Checkstop Details

CPU Information
     604E_ Chip 0 is on CPU Card 0 604E_ Chip 1 is on CPU Card 0
     604E_ Chip 2 is on CPU Card 1 604E_ Chip 3 is on CPU Card 1
     604E_ Chip 4 is on CPU Card 2 604E_ Chip 5 is on CPU Card 2

CPU chip SRR1 register information
      o No checkstop bits set.

System Memory Controller
      o No checkstop bits set.

CPU Cache Address Controller CCA2 status Reg
      o No checkstop bits set.

Data Cross Bar Information
      o No checkstop bits set.

Microchannel Controller Status Registers
      o No checkstop bits set.
------------------------------------------------------------------------------

Complete Register Information
            MSB LSB
             |... .... .... .... .... .... .... ...|
SMC GSR 0000 0000 0000 0000 0000 0000 0100 0010
              + ---------------------------- Uncorrectable ECC error
               + --------------------------- SMC refresh lost
                + ++++ ++++ +++ ++++ - Other SMC checkstop flags

SMC Single-bit error address = 0x00000000001e3190
SMC Multi-bit error address = 0x00000000001e3190

            MSB LSB
             |--GSR--| |SB_SYND| |MB_SYND| |CHIP_ID|
             |... .... .... .... .... .... .... ...|
DCB#0 0000 0000 0000 0000 0000 0000 0000 0010
DCB#1 0000 0000 0000 0000 0000 0000 0000 0010
DCB#2 0000 0000 0000 0000 0000 0000 0000 0010
DCB#3 0000 0000 0000 0000 0000 0000 0000 0010
             |... .... .... .... .... .... .... ...|
             + Single Bit Memory Error
              + Multi Bit Memory Error
               + Internal Checkstop
                  + Bus between DCB and CPU card 1
                   + Bus between DCB and CPU card 2
                    + Bus between DCB and CPU card 3
                     + Bus between DCB and IOD

               MSB LSB
                |... .... .... .... .... .... .... ...|
CPU Crd#0 CCA2 0000 0000 0000 0000 0000 0000 0000 0110
CPU Crd#1 CCA2 0000 0000 0000 0000 0000 0000 0000 0110
CPU Crd#2 CCA2 0000 0000 0000 0000 0000 0000 0000 0110
                +------------------------------ CCA2 Global Checkstop Flag
                 + ---------------------------- CCA2 Sys Parity Checkstop
                  + --------------------------- CCA2 Arb Paradox Checkstop
                   + -------------------------- CCA2 Resp Paradox
                     + ------------------------ CCA2 M0 Dcache Paradox
                      + ----------------------- CCA2 M1 Dcache Paradox
                       + ---------------------- M0 CCA2 Timeout
                        + --------------------- M1 CCA2 Timeout
                          + ------------------- CCA2 DIR Parity 0 Err
                           + ------------------ CCA2 DIR Parity 1 Err
                            + ----------------- CCA2 M0 Address Parity
                             + ---------------- CCA2 M1 Address Parity
                                + ------------- CCA2 M1 Paradox Error
                                    + --------- Refuse timeout
                                     + -------- AACK missing timeout
                                      + ------- tag0 busy timeout
                                       + ------ tag1 busy timeout
                                         + ---- ILL L2 Paradox
                                          + --- Proc Paradox
                                           + -- SBW Paradox
                                            + - Cache Paradox

CPU Crd#0 CCDs (0)=0000 0000 (1)=0000 0000 (2)=0000 0000 (3)=0000 0000
CPU Crd#1 CCDs (0)=0000 0000 (1)=0000 0000 (2)=0000 0000 (3)=0000 0000
CPU Crd#2 CCDs (0)=0000 0000 (1)=0000 0000 (2)=0000 0000 (3)=0000 0000

************************ Microchannel Controllers ********************
                             MSB LSB
                              |... .... .... ...|
IO Controller#0 Status 0000 0000 0000 0000
IO Controller#1 Status 0000 0000 0000 0000
                                             ||||
                                             0xxx Chkstp not asserted
                                             100x Chkstp due to HW
                                             110x Chkstp due to SW
                                             111x Not Present

******************** 604E_ CPU chips HID0 Registers ********************
            MSB LSB
              |... .... .... .... .... .... .... ...|
Chip #0 HID0 1111 0000 0000 0001 1100 0000 1000 0100
Chip #1 HID0 1111 0000 0000 0001 1100 0000 1000 0100
Chip #2 HID0 1111 0000 0000 0001 1100 0000 1000 0100
Chip #3 HID0 1111 0000 0000 0001 1100 0000 1000 0100
Chip #4 HID0 1111 0000 0000 0001 1100 0000 1000 0100
Chip #5 HID0 1111 0000 0000 0001 1100 0000 1000 0100
              + ------------------ Enable machine check input pin
               + ----------------- Enable cache parity checking
                + ---------------- Enable m.c. on addr. bus parity err
                 + --------------- Enable m.c. on data bus parity err

************************** CPU chip's SRR1 **************************
            MSB LSB
              |... .... .... .... .... .... .... ...|
Chip #0 srr1 0000 0000 0000 0000 1101 0000 0011 0000
Chip #1 srr1 0000 0000 0000 0000 1001 0000 0011 0010
Chip #3 srr1 0000 0000 0000 0000 1111 0000 0011 0000
Chip #4 srr1 0000 0000 0000 0000 1101 0000 0011 0010
Chip #5 srr1 0000 0000 0000 0000 1101 0000 0011 0000
              |... .... .... .... .... .... .... ...|
                          + ------ Data Cache Parity Error
                           + ----- Instruction Cache Parity Error
                             + --- Machine Check (/MCP) asserted
                              + -- /TEA pin asserted
                               + - Data Bus Parity Error
                                + Address Bus Parity Error
                                   16-29=MSR(16-29)
                                   31=MSR(31)

604E_ #0

        r0 = 00009032 , r1 = 00437c48 , r2 = 0026ae98 , r3 = d0000000
        r4 = 00000020 , r5 = 00000400 , r6 = 00000020 , r7 = d0000fe0
        r8 = 00000020 , r9 = 00000005 , r10 = 0001b76f , r11 = 00004002
        r12 = 00000400 , r13 = 00000000 , r14 = 2ff3b400 , r15 = 000e5500
        r16 = 00000000 , r17 = 00000000 , r18 = c01ef400 , r19 = 00000000
        r20 = b99d0000 , r21 = 00827711 , r22 = 00027711 , r23 = 0001b76f
        r28 = 40000000 , r29 = 00000002 , r30 = 00000000 , r31 = 00005711
        cr = 82422080 , xer = ???????? , cia = 00143fd0
        lr = 0005e640 , ctr = 00000000 , dec = f375cd15
        msr = 00009032 , dar = f0000820 , dsisr = 40000000
        srr0 = d015f868 , srr1 = 0000d030
        tbu = 00002801 , tbl = 16ae457d , ear = 00000000
        pvr = ???????? , pid = ???????? , sdr1 = 010000ff
        hid0 = f001c084 , hid1 = ???????? , hid2 = ????????
        hid5 = ???????? , mq = ????????
        sprg0 = 000e5500 , sprg1 = 00000000
        sprg2 = e6005600 , sprg3 = 00437eb0
        sr0 = 00000000 , sr1 = 00020010 , sr2 = 0001bd0c , sr3 = 00000000
        sr4 = 0000c006 , sr5 = 0000e007 , sr6 = 00010008 , sr7 = 00012009
        sr8 = 0001400a , sr9 = 0001600b , sr10 = 007fffff , sr11 = 00002001
        sr12 = 0001b76f , sr13 = 00827711 , sr14 = 00008004 , sr15 = 007fffff

604E_ #1

        r0 = 00000006 , r1 = 2ff3b2e8 , r2 = 0026ae98 , r3 = 00114a84
        r4 = 00000000 , r5 = 00000008 , r6 = 0036f680 , r7 = 00000000
        r8 = 00000003 , r9 = deadbeef , r10 = deadbeef , r11 = 00000000
        r16 = deadbeef , r17 = deadbeef , r18 = deadbeef , r19 = deadbeef
        r20 = deadbeef , r21 = deadbeef , r22 = deadbeef , r23 = deadbeef
        r24 = deadbeef , r25 = deadbeef , r26 = 0010a050 , r27 = 000034e0
        r28 = 00000000 , r29 = 00000001 , r30 = 0036f900 , r31 = 000e5b14
        cr = 24004000 , xer = ???????? , cia = 000254c4
        lr = 00025918 , ctr = 00025878 , dec = ce4b0544
        msr = 00009032 , dar = f0060fa8 , dsisr = 40000000
        srr0 = 000258c8 , srr1 = 00009032
        tbu = 00002801 , tbl = 3bd98cfd , ear = 00000000
        pvr = ???????? , pid = ???????? , sdr1 = 010000ff
        hid0 = f001c084 , hid1 = ???????? , hid2 = ????????
        hid5 = ???????? , mq = ????????
        sprg0 = 000e5880 , sprg1 = deadbeef
        sprg2 = e6000300 , sprg3 = 2ff3b400
        sr0 = 00000000 , sr1 = 00020010 , sr2 = 00000000 , sr3 = 00022011
        sr4 = 007fffff , sr5 = 007fffff , sr6 = 007fffff , sr7 = 007fffff
        sr8 = 00000000 , sr9 = 007fffff , sr10 = 007fffff , sr11 = 007fffff
        sr12 = 007fffff , sr13 = 00000000 , sr14 = 00008004 , sr15 = 007fffff

604E_ #2
     r4 = 0037a194 , r5 = 00000008 , r6 = 0036f680 , r7 = 00000000
        r8 = 00000003 , r9 = deadbeef , r10 = deadbeef , r11 = 00000000
        r12 = 00009032 , r13 = deadbeef , r14 = deadbeef , r15 = deadbeef
        r16 = deadbeef , r17 = deadbeef , r18 = deadbeef , r19 = deadbeef
        r20 = deadbeef , r21 = deadbeef , r22 = deadbeef , r23 = deadbeef
        r24 = deadbeef , r25 = deadbeef , r26 = 0010a050 , r27 = 000034e0
        r28 = 0037969e , r29 = 00000002 , r30 = 0036fb80 , r31 = 000e5e94
        cr = 22002000 , xer = ???????? , cia = 00025900
        lr = 00025918 , ctr = 00025878 , dec = a91f2233
        msr = 00009032 , dar = f005e634 , dsisr = 40000000
        srr0 = 000254a0 , srr1 = 00009032
        tbu = 00002801 , tbl = 6105f1e0 , ear = 00000000
        pvr = ???????? , pid = ???????? , sdr1 = 010000ff
        hid0 = f001c084 , hid1 = ???????? , hid2 = ????????
        hid5 = ???????? , mq = ????????
        sprg0 = 000e5c00 , sprg1 = deadbeef
        sprg2 = e6000400 , sprg3 = 2ff3b400
        sr0 = 00000000 , sr1 = 00020010 , sr2 = 00000000 , sr3 = 00022011
        sr4 = 007fffff , sr5 = 007fffff , sr6 = 007fffff , sr7 = 007fffff
        sr8 = 007fffff , sr9 = 007fffff , sr10 = 007fffff , sr11 = 007fffff
        sr12 = 007fffff , sr13 = 007fffff , sr14 = 00008004 , sr15 = 007fffff

604E_ #3

        r0 = 00009032 , r1 = 00458c48 , r2 = 0026ae98 , r3 = d000c000
        r4 = 00000020 , r5 = 00000000 , r6 = 00000020 , r7 = d000cfc0
        r8 = 00000020 , r9 = 00000005 , r10 = 00037999 , r11 = 00000000
        r12 = 00000400 , r13 = 00458eb0 , r14 = 2ff3b400 , r15 = 000e5f80
        r16 = 00000000 , r17 = b0ba9e28 , r18 = c02f2830 , r19 = 00005999
        r20 = b99d0000 , r21 = 00828cf4 , r22 = 00028cf4 , r23 = 00037999
        r24 = 00000002 , r25 = 40ec6f78 , r26 = 00ec6f78 , r27 = b0a14010
        r28 = 40000000 , r29 = 00000002 , r30 = 0000000c , r31 = 00000cf4
        cr = 82422080 , xer = ???????? , cia = 00143fc4
        lr = 0005e640 , ctr = 00000001 , dec = 83ee9aac
        msr = 00009032 , dar = 2000c63c , dsisr = 40000000
        srr0 = 10010ba8 , srr1 = 0000f030
        tbu = 00002801 , tbl = 8633eedc , ear = 00000000
        pvr = ???????? , pid = ???????? , sdr1 = 010000ff
        hid0 = f001c084 , hid1 = ???????? , hid2 = ????????
        hid5 = ???????? , mq = ????????
        sprg0 = 000e5f80 , sprg1 = 00000000
        sprg2 = e600c200 , sprg3 = 00458eb0
        sr0 = 00000000 , sr1 = 00020010 , sr2 = 00028cf4 , sr3 = 00022011
        sr4 = 0000c006 , sr5 = 0000e007 , sr6 = 00010008 , sr7 = 00012009
        sr12 = 00037999 , sr13 = 00828cf4 , sr14 = 00008004 , sr15 = 007fffff

604E_ #4

        r0 = 20000000 , r1 = 2ff3b2f0 , r2 = 0026ae98 , r3 = 2001209f
        r4 = 00000000 , r5 = e6005a00 , r6 = 00005a65 , r7 = 00000000
        r8 = 6003a01d , r9 = 00000001 , r10 = 00000000 , r11 = 001e2bb4
        r12 = 001e30c4 , r13 = deadbeef , r14 = 00000003 , r15 = 2ff22cdc
        r16 = 2ff22cec , r17 = 00000000 , r18 = deadbeef , r19 = deadbeef
        r20 = deadbeef , r21 = deadbeef , r22 = ffffffff , r23 = 2ff22d6a
        r24 = 2ff3b6e0 , r25 = 2ff3b400 , r26 = d016aed4 , r27 = 22222422
        r28 = 00002000 , r29 = 00000001 , r30 = 200110a0 , r31 = 2ff3b400
        cr = 2a222422 , xer = ???????? , cia = 001e3194
        lr = 001e30d4 , ctr = 001e2bb4 , dec = 5ebf97a4
        msr = 00009032 , dar = f0004358 , dsisr = 40000000
        srr0 = 000037d8 , srr1 = 0000d032
        tbu = 00002801 , tbl = ab637455 , ear = 00000000
        pvr = ???????? , pid = ???????? , sdr1 = 010000ff
        hid0 = f001c084 , hid1 = ???????? , hid2 = ????????
        hid5 = ???????? , mq = ????????
        sprg0 = 000e6300 , sprg1 = 2ff22cdc
        sr0 = 00000000 , sr1 = 00020010 , sr2 = 000379b9 , sr3 = 00022011
        sr4 = 007fffff , sr5 = 007fffff , sr6 = 007fffff , sr7 = 0001a00d
        sr8 = 007fffff , sr9 = 007fffff , sr10 = 007fffff , sr11 = 007fffff
        sr12 = 007fffff , sr13 = 60000020 , sr14 = 00008004 , sr15 = 007fffff

604E_ #5

        r0 = 00009032 , r1 = 0046ec48 , r2 = 0026ae98 , r3 = dff22000
        r4 = 00000020 , r5 = 00000400 , r6 = 00000020 , r7 = dff22bc0
        r8 = 00000020 , r9 = 00000005 , r10 = 000155e8 , r11 = 00004002
        r12 = 00000400 , r13 = 0046eeb0 , r14 = 2ff3b400 , r15 = 000e6680
        r16 = 00000000 , r17 = b0b96814 , r18 = c039b488 , r19 = 000055e8
        r20 = b99d0000 , r21 = 00821e91 , r22 = 00021e91 , r23 = 000155e8
        r24 = 00000002 , r25 = 40ec7410 , r26 = 00ec7410 , r27 = b0b18794
        r28 = 40000000 , r29 = 00000002 , r30 = 0000ff22 , r31 = 00003e91
        cr = 82422080 , xer = ???????? , cia = 00143fc4
        lr = 0005e640 , ctr = 00000009 , dec = 398e23dd
        msr = 00009032 , dar = 2ff22ff8 , dsisr = 40000000
        srr0 = d0177b8c , srr1 = 0000d030
        tbu = 00002801 , tbl = d09569c3 , ear = 00000000
        pvr = ???????? , pid = ???????? , sdr1 = 010000ff
        hid5 = ???????? , mq = ????????
        sprg0 = 000e6680 , sprg1 = 00000000
        sprg2 = e6006900 , sprg3 = 0046eeb0
        sr0 = 00000000 , sr1 = 00020010 , sr2 = 00021e91 , sr3 = 00022011
        sr4 = 0000c006 , sr5 = 0000e007 , sr6 = 00010008 , sr7 = 00012009
        sr8 = 0001400a , sr9 = 0001600b , sr10 = 007fffff , sr11 = 00000000
        sr12 = 000155e8 , sr13 = 00821e91 , sr14 = 00008004 , sr15 = 007fffff
-------------------------------------------------------------------------------

IO Controller Scan information

IO Controller 0
csr[000]=0x10000000 csr[001]=0x10000000 csr[002]=0x10000000
csr[003]=0x10000000 csr[004]=0x10000000 csr[005]=0x00000000
csr[006]=0x10000000 csr[007]=0x10000000 csr[008]=0x10000000
csr[009]=0x10000000 csr[010]=0x10000000 csr[011]=0x10000000
csr[012]=0x10000000 csr[013]=0x10000000 csr[014]=0x10000000
csr[015]=0x00000000
dsc[000]=0x00000000 dsc[001]=0x00000000 dsc[002]=0x00000000
dsc[003]=0x00000000 dsc[004]=0x00000000 dsc[005]=0x00000000
dsc[006]=0x00000000 dsc[007]=0x00000000 dsc[008]=0x00000000
dsc[012]=0x00000000 dsc[013]=0x00000000 dsc[014]=0x00000000
dsc[015]=0x00000000
m_enable=0x1 personalization=0x30030000 bus_stat=0x00 tce_addr_high=0x5fd00
tce_addr_low=0x4 bus_mapping=0xffffffff crr=0xffff0007 mode0=0xf60029f4
mode1=0xf0000000 j_pio_s_m=0x0 k_s_m=0x0 l_s_m=0x0 read_s_m=0x0
write_s_m=0x0 d_s_m=0x0 e_s_m=0x0 f_s_m=0x0 pio_inprogress=0x0
pio_read_nw=0x0 pio_byte_count=0x1f pio_completion=0x0 pio_error=0x0
pio_addr_high=0x50060000 cl_addr_high=0x2 cl_addr_low=0x439007e0
dsier_add_l=0x2 sh_addr_h=0x90000000 sh_addr_l=0x439fef20

IO Controller 1
csr[000]=0xcb8e8a80 csr[001]=0x9f4d8444 csr[002]=0x0454dc76
csr[003]=0x24d04400 csr[004]=0x2a8a4c99 csr[005]=0x90c88088
csr[006]=0x54c4cd04 csr[007]=0x04400c82 csr[008]=0xda414380
csr[009]=0x79c8828f csr[010]=0xd0c30100 csr[011]=0xb0c28644
csr[012]=0x2e38a000 csr[013]=0x843c3004 csr[014]=0x365800dd
csr[015]=0x1c182000
dsc[000]=0x2848e425 dsc[001]=0x38ede1a5 dsc[002]=0x08cdc104
dsc[003]=0x394ce484 dsc[004]=0x048c8858 dsc[005]=0x02c6c632
dsc[006]=0x1684c808 dsc[007]=0x848cd01e dsc[008]=0x1cccc008
dsc[009]=0x4ccc4000 dsc[010]=0x048cc841 dsc[011]=0x0ccc4000
dsc[012]=0xeccc8804 dsc[013]=0x44cc8800 dsc[014]=0x64cc8804
m_enable=0x1 personalization=0x304b2a00 bus_stat=0x21 tce_addr_high=0x5fc00
tce_addr_low=0x4 bus_mapping=0xffffffff crr=0xffff0007 mode0=0xf6002bf4
mode1=0xf6000000 j_pio_s_m=0x0 k_s_m=0x0 l_s_m=0x0 read_s_m=0x0
write_s_m=0x0 d_s_m=0x0 e_s_m=0x0 f_s_m=0x0 pio_inprogress=0x0
pio_read_nw=0x0 pio_byte_count=0x1d pio_completion=0x0 pio_error=0x0
pio_addr_high=0xd977f0f4 cl_addr_high=0xb cl_addr_low=0x07400000
dsier_add_l=0x4 sh_addr_h=0x909002e8 sh_addr_l=0x7b9fef20

          =====================================

**********************************************************************
The information in this E-Mail is confidential and may be legally
privileged. It may not represent the views of Scottish and Southern
Energy plc.
It is intended solely for the addressees. Access to this E-Mail by
anyone else is unauthorised. If you are not the intended recipient,
any disclosure, copying, distribution or any action taken or omitted
to be taken in reliance on it, is prohibited and may be unlawful.
Any unauthorised recipient should advise the sender immediately of
the error in transmission.

Scottish Hydro-Electric, Southern Electric, SWALEC and S+S
are trading names of the Scottish and Southern Energy Group.
**********************************************************************



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:09 EDT