Re: Checkstop analysis .

From: Mark Malins (mark.malins@SCOTTISH-SOUTHERN.CO.UK)
Date: Wed Aug 21 2002 - 10:36:32 EDT


Hi ,
Thanks to all that replied ,i forgot to mention that it was the second or third
time it has happened , about once or twice a year, we have run diags but nothing
shows
up , we have already replaced the planar a while back . The box in question
should be consolidated soon anyway .
Regards,
Mark .

"Jolet, John" <john.jolet@misyshealthcare.com> on 21/08/2002 13:57:46

Please respond to IBM AIX Discussion List <aix-l@princeton.edu>

To: aix-l@princeton.edu
cc: (bcc: Mark Malins/HAV/SSE)
Subject: Re: Checkstop analysis .

i'm no expert, but i'd take the machine to single-user and run diags on the
cpu, memory and planar.

-----Original Message-----
From: Mark Malins [mailto:mark.malins@SCOTTISH-SOUTHERN.CO.UK]
Sent: Wednesday, August 21, 2002 5:26 AM
To: aix-l@Princeton.EDU
Subject: Checkstop analysis .

Hi all ,
We have had a hardware crash on one of our systems , below is a a output
from a
checkstop analysis ,it looks like a memory or planar issue , but does
anyone know how to be more specific with a diagnosis ? , any input
appreciated
. The box in question is a J50 running Aix 4.3.3 ML 8 .
Best Regards,
Mark .

SMP Checkstop Interpretation. Version 8.5
----------------------------------------------------------------------------

---
File checkstop.00059487A100.A
----------------------------------------------------------------------------
---
Action Plans
   There is no checker on the system claiming to have pulled the checkstop
line.
   Your support center may be needed to help you with this checkstop.
----------------------------------------------------------------------------
---
SRN/FRU Information
As a part of diagnostics, smpcheck would not have made a FRU callout in this
case.
----------------------------------------------------------------------------
---
Checkstop Details
CPU Information
     604E_ Chip 0 is on CPU Card 0    604E_ Chip 1 is on CPU Card 0
     604E_ Chip 2 is on CPU Card 1    604E_ Chip 3 is on CPU Card 1
     604E_ Chip 4 is on CPU Card 2    604E_ Chip 5 is on CPU Card 2
CPU chip SRR1 register information
      o No checkstop bits set.
System Memory Controller
      o No checkstop bits set.
CPU Cache Address Controller CCA2 status Reg
      o No checkstop bits set.
Data Cross Bar Information
      o No checkstop bits set.
Microchannel Controller Status Registers
      o No checkstop bits set.
----------------------------------------------------------------------------
--
Complete Register Information
            MSB                                  LSB
             |... .... .... .... .... .... .... ...|
SMC GSR      0000 0000 0000 0000 0000 0000 0100 0010
              + ---------------------------- Uncorrectable ECC error
               + --------------------------- SMC refresh lost
                + ++++ ++++ +++       ++++ - Other SMC checkstop flags
SMC Single-bit error address = 0x00000000001e3190
SMC Multi-bit error address  = 0x00000000001e3190
            MSB                                  LSB
             |--GSR--| |SB_SYND| |MB_SYND| |CHIP_ID|
             |... .... .... .... .... .... .... ...|
DCB#0        0000 0000 0000 0000 0000 0000 0000 0010
DCB#1        0000 0000 0000 0000 0000 0000 0000 0010
DCB#2        0000 0000 0000 0000 0000 0000 0000 0010
DCB#3        0000 0000 0000 0000 0000 0000 0000 0010
             |... .... .... .... .... .... .... ...|
             +                Single Bit Memory Error
              +               Multi Bit Memory Error
               +              Internal Checkstop
                  +           Bus between DCB and CPU card 1
                   +          Bus between DCB and CPU card 2
                    +         Bus between DCB and CPU card 3
                     +        Bus between DCB and IOD
               MSB                                  LSB
                |... .... .... .... .... .... .... ...|
CPU Crd#0 CCA2  0000 0000 0000 0000 0000 0000 0000 0110
CPU Crd#1 CCA2  0000 0000 0000 0000 0000 0000 0000 0110
CPU Crd#2 CCA2  0000 0000 0000 0000 0000 0000 0000 0110
                +------------------------------ CCA2 Global Checkstop Flag
                 + ---------------------------- CCA2 Sys Parity Checkstop
                  + --------------------------- CCA2 Arb Paradox Checkstop
                   + -------------------------- CCA2 Resp Paradox
                     + ------------------------ CCA2 M0 Dcache Paradox
                      + ----------------------- CCA2 M1 Dcache Paradox
                       + ---------------------- M0 CCA2 Timeout
                        + --------------------- M1 CCA2 Timeout
                          + ------------------- CCA2 DIR Parity 0 Err
                           + ------------------ CCA2 DIR Parity 1 Err
                            + ----------------- CCA2 M0 Address Parity
                             + ---------------- CCA2 M1 Address Parity
                                + ------------- CCA2 M1 Paradox Error
                                    + --------- Refuse timeout
                                     + -------- AACK missing timeout
                                      + ------- tag0 busy timeout
                                       + ------ tag1 busy timeout
                                         + ---- ILL L2 Paradox
                                          + --- Proc Paradox
                                           + -- SBW Paradox
                                            + - Cache Paradox
CPU Crd#0 CCDs (0)=0000 0000  (1)=0000 0000 (2)=0000 0000 (3)=0000 0000
CPU Crd#1 CCDs (0)=0000 0000  (1)=0000 0000 (2)=0000 0000 (3)=0000 0000
CPU Crd#2 CCDs (0)=0000 0000  (1)=0000 0000 (2)=0000 0000 (3)=0000 0000
************************ Microchannel Controllers ********************
                             MSB              LSB
                              |... .... .... ...|
IO Controller#0  Status       0000 0000 0000 0000
IO Controller#1  Status       0000 0000 0000 0000
                                             ||||
                                             0xxx Chkstp not asserted
                                             100x Chkstp due to HW
                                             110x Chkstp due to SW
                                             111x Not Present
********************  604E_ CPU chips HID0 Registers ********************
            MSB                                    LSB
              |... .... .... .... .... .... .... ...|
Chip #0 HID0  1111 0000 0000 0001 1100 0000 1000 0100
Chip #1 HID0  1111 0000 0000 0001 1100 0000 1000 0100
Chip #2 HID0  1111 0000 0000 0001 1100 0000 1000 0100
Chip #3 HID0  1111 0000 0000 0001 1100 0000 1000 0100
Chip #4 HID0  1111 0000 0000 0001 1100 0000 1000 0100
Chip #5 HID0  1111 0000 0000 0001 1100 0000 1000 0100
              + ------------------ Enable machine check input pin
               + ----------------- Enable cache parity checking
                + ---------------- Enable m.c. on addr. bus parity err
                 + --------------- Enable m.c. on data bus parity err
**************************  CPU chip's SRR1  **************************
            MSB                                    LSB
              |... .... .... .... .... .... .... ...|
Chip #0 srr1  0000 0000 0000 0000 1101 0000 0011 0000
Chip #1 srr1  0000 0000 0000 0000 1001 0000 0011 0010
Chip #3 srr1  0000 0000 0000 0000 1111 0000 0011 0000
Chip #4 srr1  0000 0000 0000 0000 1101 0000 0011 0010
Chip #5 srr1  0000 0000 0000 0000 1101 0000 0011 0000
              |... .... .... .... .... .... .... ...|
                          + ------ Data Cache Parity Error
                           + ----- Instruction Cache Parity Error
                             + --- Machine Check (/MCP) asserted
                              + -- /TEA pin asserted
                               + - Data Bus Parity Error
                                +  Address Bus Parity Error
                                   16-29=MSR(16-29)
                                   31=MSR(31)
604E_ #0
        r0  = 00009032 , r1  = 00437c48 , r2  = 0026ae98 , r3  = d0000000
        r4  = 00000020 , r5  = 00000400 , r6  = 00000020 , r7  = d0000fe0
        r8  = 00000020 , r9  = 00000005 , r10 = 0001b76f , r11 = 00004002
        r12 = 00000400 , r13 = 00000000 , r14 = 2ff3b400 , r15 = 000e5500
        r16 = 00000000 , r17 = 00000000 , r18 = c01ef400 , r19 = 00000000
        r20 = b99d0000 , r21 = 00827711 , r22 = 00027711 , r23 = 0001b76f
        r28 = 40000000 , r29 = 00000002 , r30 = 00000000 , r31 = 00005711
        cr       = 82422080 , xer       = ???????? , cia       = 00143fd0
        lr       = 0005e640 , ctr       = 00000000 , dec       = f375cd15
        msr      = 00009032 , dar       = f0000820 , dsisr     = 40000000
        srr0     = d015f868 , srr1      = 0000d030
        tbu      = 00002801 , tbl       = 16ae457d , ear       = 00000000
        pvr      = ???????? , pid       = ???????? , sdr1      = 010000ff
        hid0     = f001c084 , hid1      = ???????? , hid2      = ????????
        hid5     = ???????? , mq        = ????????
        sprg0    = 000e5500 , sprg1     = 00000000
        sprg2    = e6005600 , sprg3     = 00437eb0
        sr0  = 00000000 , sr1  = 00020010 , sr2  = 0001bd0c , sr3  =
00000000
        sr4  = 0000c006 , sr5  = 0000e007 , sr6  = 00010008 , sr7  =
00012009
        sr8  = 0001400a , sr9  = 0001600b , sr10 = 007fffff , sr11 =
00002001
        sr12 = 0001b76f , sr13 = 00827711 , sr14 = 00008004 , sr15 =
007fffff
604E_ #1
        r0  = 00000006 , r1  = 2ff3b2e8 , r2  = 0026ae98 , r3  = 00114a84
        r4  = 00000000 , r5  = 00000008 , r6  = 0036f680 , r7  = 00000000
        r8  = 00000003 , r9  = deadbeef , r10 = deadbeef , r11 = 00000000
        r16 = deadbeef , r17 = deadbeef , r18 = deadbeef , r19 = deadbeef
        r20 = deadbeef , r21 = deadbeef , r22 = deadbeef , r23 = deadbeef
        r24 = deadbeef , r25 = deadbeef , r26 = 0010a050 , r27 = 000034e0
        r28 = 00000000 , r29 = 00000001 , r30 = 0036f900 , r31 = 000e5b14
        cr       = 24004000 , xer       = ???????? , cia       = 000254c4
        lr       = 00025918 , ctr       = 00025878 , dec       = ce4b0544
        msr      = 00009032 , dar       = f0060fa8 , dsisr     = 40000000
        srr0     = 000258c8 , srr1      = 00009032
        tbu      = 00002801 , tbl       = 3bd98cfd , ear       = 00000000
        pvr      = ???????? , pid       = ???????? , sdr1      = 010000ff
        hid0     = f001c084 , hid1      = ???????? , hid2      = ????????
        hid5     = ???????? , mq        = ????????
        sprg0    = 000e5880 , sprg1     = deadbeef
        sprg2    = e6000300 , sprg3     = 2ff3b400
        sr0  = 00000000 , sr1  = 00020010 , sr2  = 00000000 , sr3  =
00022011
        sr4  = 007fffff , sr5  = 007fffff , sr6  = 007fffff , sr7  =
007fffff
        sr8  = 00000000 , sr9  = 007fffff , sr10 = 007fffff , sr11 =
007fffff
        sr12 = 007fffff , sr13 = 00000000 , sr14 = 00008004 , sr15 =
007fffff
604E_ #2
     r4  = 0037a194 , r5  = 00000008 , r6  = 0036f680 , r7  = 00000000
        r8  = 00000003 , r9  = deadbeef , r10 = deadbeef , r11 = 00000000
        r12 = 00009032 , r13 = deadbeef , r14 = deadbeef , r15 = deadbeef
        r16 = deadbeef , r17 = deadbeef , r18 = deadbeef , r19 = deadbeef
        r20 = deadbeef , r21 = deadbeef , r22 = deadbeef , r23 = deadbeef
        r24 = deadbeef , r25 = deadbeef , r26 = 0010a050 , r27 = 000034e0
        r28 = 0037969e , r29 = 00000002 , r30 = 0036fb80 , r31 = 000e5e94
        cr       = 22002000 , xer       = ???????? , cia       = 00025900
        lr       = 00025918 , ctr       = 00025878 , dec       = a91f2233
        msr      = 00009032 , dar       = f005e634 , dsisr     = 40000000
        srr0     = 000254a0 , srr1      = 00009032
        tbu      = 00002801 , tbl       = 6105f1e0 , ear       = 00000000
        pvr      = ???????? , pid       = ???????? , sdr1      = 010000ff
        hid0     = f001c084 , hid1      = ???????? , hid2      = ????????
        hid5     = ???????? , mq        = ????????
        sprg0    = 000e5c00 , sprg1     = deadbeef
        sprg2    = e6000400 , sprg3     = 2ff3b400
        sr0  = 00000000 , sr1  = 00020010 , sr2  = 00000000 , sr3  =
00022011
        sr4  = 007fffff , sr5  = 007fffff , sr6  = 007fffff , sr7  =
007fffff
        sr8  = 007fffff , sr9  = 007fffff , sr10 = 007fffff , sr11 =
007fffff
        sr12 = 007fffff , sr13 = 007fffff , sr14 = 00008004 , sr15 =
007fffff
604E_ #3
        r0  = 00009032 , r1  = 00458c48 , r2  = 0026ae98 , r3  = d000c000
        r4  = 00000020 , r5  = 00000000 , r6  = 00000020 , r7  = d000cfc0
        r8  = 00000020 , r9  = 00000005 , r10 = 00037999 , r11 = 00000000
        r12 = 00000400 , r13 = 00458eb0 , r14 = 2ff3b400 , r15 = 000e5f80
        r16 = 00000000 , r17 = b0ba9e28 , r18 = c02f2830 , r19 = 00005999
        r20 = b99d0000 , r21 = 00828cf4 , r22 = 00028cf4 , r23 = 00037999
        r24 = 00000002 , r25 = 40ec6f78 , r26 = 00ec6f78 , r27 = b0a14010
        r28 = 40000000 , r29 = 00000002 , r30 = 0000000c , r31 = 00000cf4
        cr       = 82422080 , xer       = ???????? , cia       = 00143fc4
        lr       = 0005e640 , ctr       = 00000001 , dec       = 83ee9aac
        msr      = 00009032 , dar       = 2000c63c , dsisr     = 40000000
        srr0     = 10010ba8 , srr1      = 0000f030
        tbu      = 00002801 , tbl       = 8633eedc , ear       = 00000000
        pvr      = ???????? , pid       = ???????? , sdr1      = 010000ff
        hid0     = f001c084 , hid1      = ???????? , hid2      = ????????
        hid5     = ???????? , mq        = ????????
        sprg0    = 000e5f80 , sprg1     = 00000000
        sprg2    = e600c200 , sprg3     = 00458eb0
        sr0  = 00000000 , sr1  = 00020010 , sr2  = 00028cf4 , sr3  =
00022011
        sr4  = 0000c006 , sr5  = 0000e007 , sr6  = 00010008 , sr7  =
00012009
        sr12 = 00037999 , sr13 = 00828cf4 , sr14 = 00008004 , sr15 =
007fffff
604E_ #4
        r0  = 20000000 , r1  = 2ff3b2f0 , r2  = 0026ae98 , r3  = 2001209f
        r4  = 00000000 , r5  = e6005a00 , r6  = 00005a65 , r7  = 00000000
        r8  = 6003a01d , r9  = 00000001 , r10 = 00000000 , r11 = 001e2bb4
        r12 = 001e30c4 , r13 = deadbeef , r14 = 00000003 , r15 = 2ff22cdc
        r16 = 2ff22cec , r17 = 00000000 , r18 = deadbeef , r19 = deadbeef
        r20 = deadbeef , r21 = deadbeef , r22 = ffffffff , r23 = 2ff22d6a
        r24 = 2ff3b6e0 , r25 = 2ff3b400 , r26 = d016aed4 , r27 = 22222422
        r28 = 00002000 , r29 = 00000001 , r30 = 200110a0 , r31 = 2ff3b400
        cr       = 2a222422 , xer       = ???????? , cia       = 001e3194
        lr       = 001e30d4 , ctr       = 001e2bb4 , dec       = 5ebf97a4
        msr      = 00009032 , dar       = f0004358 , dsisr     = 40000000
        srr0     = 000037d8 , srr1      = 0000d032
        tbu      = 00002801 , tbl       = ab637455 , ear       = 00000000
        pvr      = ???????? , pid       = ???????? , sdr1      = 010000ff
        hid0     = f001c084 , hid1      = ???????? , hid2      = ????????
        hid5     = ???????? , mq        = ????????
        sprg0    = 000e6300 , sprg1     = 2ff22cdc
        sr0  = 00000000 , sr1  = 00020010 , sr2  = 000379b9 , sr3  =
00022011
        sr4  = 007fffff , sr5  = 007fffff , sr6  = 007fffff , sr7  =
0001a00d
        sr8  = 007fffff , sr9  = 007fffff , sr10 = 007fffff , sr11 =
007fffff
        sr12 = 007fffff , sr13 = 60000020 , sr14 = 00008004 , sr15 =
007fffff
604E_ #5
        r0  = 00009032 , r1  = 0046ec48 , r2  = 0026ae98 , r3  = dff22000
        r4  = 00000020 , r5  = 00000400 , r6  = 00000020 , r7  = dff22bc0
        r8  = 00000020 , r9  = 00000005 , r10 = 000155e8 , r11 = 00004002
        r12 = 00000400 , r13 = 0046eeb0 , r14 = 2ff3b400 , r15 = 000e6680
        r16 = 00000000 , r17 = b0b96814 , r18 = c039b488 , r19 = 000055e8
        r20 = b99d0000 , r21 = 00821e91 , r22 = 00021e91 , r23 = 000155e8
        r24 = 00000002 , r25 = 40ec7410 , r26 = 00ec7410 , r27 = b0b18794
        r28 = 40000000 , r29 = 00000002 , r30 = 0000ff22 , r31 = 00003e91
        cr       = 82422080 , xer       = ???????? , cia       = 00143fc4
        lr       = 0005e640 , ctr       = 00000009 , dec       = 398e23dd
        msr      = 00009032 , dar       = 2ff22ff8 , dsisr     = 40000000
        srr0     = d0177b8c , srr1      = 0000d030
        tbu      = 00002801 , tbl       = d09569c3 , ear       = 00000000
        pvr      = ???????? , pid       = ???????? , sdr1      = 010000ff
        hid5     = ???????? , mq        = ????????
        sprg0    = 000e6680 , sprg1     = 00000000
        sprg2    = e6006900 , sprg3     = 0046eeb0
        sr0  = 00000000 , sr1  = 00020010 , sr2  = 00021e91 , sr3  =
00022011
        sr4  = 0000c006 , sr5  = 0000e007 , sr6  = 00010008 , sr7  =
00012009
        sr8  = 0001400a , sr9  = 0001600b , sr10 = 007fffff , sr11 =
00000000
        sr12 = 000155e8 , sr13 = 00821e91 , sr14 = 00008004 , sr15 =
007fffff
----------------------------------------------------------------------------
---
IO Controller Scan information
IO Controller 0
csr[000]=0x10000000  csr[001]=0x10000000  csr[002]=0x10000000
csr[003]=0x10000000  csr[004]=0x10000000  csr[005]=0x00000000
csr[006]=0x10000000  csr[007]=0x10000000  csr[008]=0x10000000
csr[009]=0x10000000  csr[010]=0x10000000  csr[011]=0x10000000
csr[012]=0x10000000  csr[013]=0x10000000  csr[014]=0x10000000
csr[015]=0x00000000
dsc[000]=0x00000000  dsc[001]=0x00000000  dsc[002]=0x00000000
dsc[003]=0x00000000  dsc[004]=0x00000000  dsc[005]=0x00000000
dsc[006]=0x00000000  dsc[007]=0x00000000  dsc[008]=0x00000000
dsc[012]=0x00000000  dsc[013]=0x00000000  dsc[014]=0x00000000
dsc[015]=0x00000000
m_enable=0x1  personalization=0x30030000  bus_stat=0x00
tce_addr_high=0x5fd00
tce_addr_low=0x4  bus_mapping=0xffffffff  crr=0xffff0007  mode0=0xf60029f4
mode1=0xf0000000  j_pio_s_m=0x0  k_s_m=0x0  l_s_m=0x0  read_s_m=0x0
write_s_m=0x0  d_s_m=0x0  e_s_m=0x0  f_s_m=0x0  pio_inprogress=0x0
pio_read_nw=0x0  pio_byte_count=0x1f  pio_completion=0x0  pio_error=0x0
pio_addr_high=0x50060000  cl_addr_high=0x2  cl_addr_low=0x439007e0
dsier_add_l=0x2  sh_addr_h=0x90000000  sh_addr_l=0x439fef20
IO Controller 1
csr[000]=0xcb8e8a80  csr[001]=0x9f4d8444  csr[002]=0x0454dc76
csr[003]=0x24d04400  csr[004]=0x2a8a4c99  csr[005]=0x90c88088
csr[006]=0x54c4cd04  csr[007]=0x04400c82  csr[008]=0xda414380
csr[009]=0x79c8828f  csr[010]=0xd0c30100  csr[011]=0xb0c28644
csr[012]=0x2e38a000  csr[013]=0x843c3004  csr[014]=0x365800dd
csr[015]=0x1c182000
dsc[000]=0x2848e425  dsc[001]=0x38ede1a5  dsc[002]=0x08cdc104
dsc[003]=0x394ce484  dsc[004]=0x048c8858  dsc[005]=0x02c6c632
dsc[006]=0x1684c808  dsc[007]=0x848cd01e  dsc[008]=0x1cccc008
dsc[009]=0x4ccc4000  dsc[010]=0x048cc841  dsc[011]=0x0ccc4000
dsc[012]=0xeccc8804  dsc[013]=0x44cc8800  dsc[014]=0x64cc8804
m_enable=0x1  personalization=0x304b2a00  bus_stat=0x21
tce_addr_high=0x5fc00
tce_addr_low=0x4  bus_mapping=0xffffffff  crr=0xffff0007  mode0=0xf6002bf4
mode1=0xf6000000  j_pio_s_m=0x0  k_s_m=0x0  l_s_m=0x0  read_s_m=0x0
write_s_m=0x0  d_s_m=0x0  e_s_m=0x0  f_s_m=0x0  pio_inprogress=0x0
pio_read_nw=0x0  pio_byte_count=0x1d  pio_completion=0x0  pio_error=0x0
pio_addr_high=0xd977f0f4  cl_addr_high=0xb  cl_addr_low=0x07400000
dsier_add_l=0x4  sh_addr_h=0x909002e8  sh_addr_l=0x7b9fef20
          =====================================
**********************************************************************
The information in this E-Mail is confidential and may be legally
privileged. It may not represent the views of Scottish and Southern
Energy plc.
It is intended solely for the addressees. Access to this E-Mail by
anyone else is unauthorised. If you are not the intended recipient,
any disclosure, copying, distribution or any action taken or omitted
to be taken in reliance on it, is prohibited and may be unlawful.
Any unauthorised recipient should advise the sender immediately of
the error in transmission.
Scottish Hydro-Electric, Southern Electric, SWALEC and S+S
are trading names of the Scottish and Southern Energy Group.
**********************************************************************


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:09 EDT