From: Mark Malins (mark.malins@SCOTTISH-SOUTHERN.CO.UK)
Date: Wed Aug 21 2002 - 10:36:32 EDT
Hi ,
Thanks to all that replied ,i forgot to mention that it was the second or third
time it has happened , about once or twice a year, we have run diags but nothing
shows
up , we have already replaced the planar a while back . The box in question
should be consolidated soon anyway .
Regards,
Mark .
"Jolet, John" <john.jolet@misyshealthcare.com> on 21/08/2002 13:57:46
Please respond to IBM AIX Discussion List <aix-l@princeton.edu>
To: aix-l@princeton.edu
cc: (bcc: Mark Malins/HAV/SSE)
Subject: Re: Checkstop analysis .
i'm no expert, but i'd take the machine to single-user and run diags on the
cpu, memory and planar.
-----Original Message-----
From: Mark Malins [mailto:mark.malins@SCOTTISH-SOUTHERN.CO.UK]
Sent: Wednesday, August 21, 2002 5:26 AM
To: aix-l@Princeton.EDU
Subject: Checkstop analysis .
Hi all ,
We have had a hardware crash on one of our systems , below is a a output
from a
checkstop analysis ,it looks like a memory or planar issue , but does
anyone know how to be more specific with a diagnosis ? , any input
appreciated
. The box in question is a J50 running Aix 4.3.3 ML 8 .
Best Regards,
Mark .
SMP Checkstop Interpretation. Version 8.5
----------------------------------------------------------------------------
--- File checkstop.00059487A100.A ---------------------------------------------------------------------------- --- Action Plans There is no checker on the system claiming to have pulled the checkstop line. Your support center may be needed to help you with this checkstop. ---------------------------------------------------------------------------- --- SRN/FRU Information As a part of diagnostics, smpcheck would not have made a FRU callout in this case. ---------------------------------------------------------------------------- --- Checkstop Details CPU Information 604E_ Chip 0 is on CPU Card 0 604E_ Chip 1 is on CPU Card 0 604E_ Chip 2 is on CPU Card 1 604E_ Chip 3 is on CPU Card 1 604E_ Chip 4 is on CPU Card 2 604E_ Chip 5 is on CPU Card 2 CPU chip SRR1 register information o No checkstop bits set. System Memory Controller o No checkstop bits set. CPU Cache Address Controller CCA2 status Reg o No checkstop bits set. Data Cross Bar Information o No checkstop bits set. Microchannel Controller Status Registers o No checkstop bits set. ---------------------------------------------------------------------------- -- Complete Register Information MSB LSB |... .... .... .... .... .... .... ...| SMC GSR 0000 0000 0000 0000 0000 0000 0100 0010 + ---------------------------- Uncorrectable ECC error + --------------------------- SMC refresh lost + ++++ ++++ +++ ++++ - Other SMC checkstop flags SMC Single-bit error address = 0x00000000001e3190 SMC Multi-bit error address = 0x00000000001e3190 MSB LSB |--GSR--| |SB_SYND| |MB_SYND| |CHIP_ID| |... .... .... .... .... .... .... ...| DCB#0 0000 0000 0000 0000 0000 0000 0000 0010 DCB#1 0000 0000 0000 0000 0000 0000 0000 0010 DCB#2 0000 0000 0000 0000 0000 0000 0000 0010 DCB#3 0000 0000 0000 0000 0000 0000 0000 0010 |... .... .... .... .... .... .... ...| + Single Bit Memory Error + Multi Bit Memory Error + Internal Checkstop + Bus between DCB and CPU card 1 + Bus between DCB and CPU card 2 + Bus between DCB and CPU card 3 + Bus between DCB and IOD MSB LSB |... .... .... .... .... .... .... ...| CPU Crd#0 CCA2 0000 0000 0000 0000 0000 0000 0000 0110 CPU Crd#1 CCA2 0000 0000 0000 0000 0000 0000 0000 0110 CPU Crd#2 CCA2 0000 0000 0000 0000 0000 0000 0000 0110 +------------------------------ CCA2 Global Checkstop Flag + ---------------------------- CCA2 Sys Parity Checkstop + --------------------------- CCA2 Arb Paradox Checkstop + -------------------------- CCA2 Resp Paradox + ------------------------ CCA2 M0 Dcache Paradox + ----------------------- CCA2 M1 Dcache Paradox + ---------------------- M0 CCA2 Timeout + --------------------- M1 CCA2 Timeout + ------------------- CCA2 DIR Parity 0 Err + ------------------ CCA2 DIR Parity 1 Err + ----------------- CCA2 M0 Address Parity + ---------------- CCA2 M1 Address Parity + ------------- CCA2 M1 Paradox Error + --------- Refuse timeout + -------- AACK missing timeout + ------- tag0 busy timeout + ------ tag1 busy timeout + ---- ILL L2 Paradox + --- Proc Paradox + -- SBW Paradox + - Cache Paradox CPU Crd#0 CCDs (0)=0000 0000 (1)=0000 0000 (2)=0000 0000 (3)=0000 0000 CPU Crd#1 CCDs (0)=0000 0000 (1)=0000 0000 (2)=0000 0000 (3)=0000 0000 CPU Crd#2 CCDs (0)=0000 0000 (1)=0000 0000 (2)=0000 0000 (3)=0000 0000 ************************ Microchannel Controllers ******************** MSB LSB |... .... .... ...| IO Controller#0 Status 0000 0000 0000 0000 IO Controller#1 Status 0000 0000 0000 0000 |||| 0xxx Chkstp not asserted 100x Chkstp due to HW 110x Chkstp due to SW 111x Not Present ******************** 604E_ CPU chips HID0 Registers ******************** MSB LSB |... .... .... .... .... .... .... ...| Chip #0 HID0 1111 0000 0000 0001 1100 0000 1000 0100 Chip #1 HID0 1111 0000 0000 0001 1100 0000 1000 0100 Chip #2 HID0 1111 0000 0000 0001 1100 0000 1000 0100 Chip #3 HID0 1111 0000 0000 0001 1100 0000 1000 0100 Chip #4 HID0 1111 0000 0000 0001 1100 0000 1000 0100 Chip #5 HID0 1111 0000 0000 0001 1100 0000 1000 0100 + ------------------ Enable machine check input pin + ----------------- Enable cache parity checking + ---------------- Enable m.c. on addr. bus parity err + --------------- Enable m.c. on data bus parity err ************************** CPU chip's SRR1 ************************** MSB LSB |... .... .... .... .... .... .... ...| Chip #0 srr1 0000 0000 0000 0000 1101 0000 0011 0000 Chip #1 srr1 0000 0000 0000 0000 1001 0000 0011 0010 Chip #3 srr1 0000 0000 0000 0000 1111 0000 0011 0000 Chip #4 srr1 0000 0000 0000 0000 1101 0000 0011 0010 Chip #5 srr1 0000 0000 0000 0000 1101 0000 0011 0000 |... .... .... .... .... .... .... ...| + ------ Data Cache Parity Error + ----- Instruction Cache Parity Error + --- Machine Check (/MCP) asserted + -- /TEA pin asserted + - Data Bus Parity Error + Address Bus Parity Error 16-29=MSR(16-29) 31=MSR(31) 604E_ #0 r0 = 00009032 , r1 = 00437c48 , r2 = 0026ae98 , r3 = d0000000 r4 = 00000020 , r5 = 00000400 , r6 = 00000020 , r7 = d0000fe0 r8 = 00000020 , r9 = 00000005 , r10 = 0001b76f , r11 = 00004002 r12 = 00000400 , r13 = 00000000 , r14 = 2ff3b400 , r15 = 000e5500 r16 = 00000000 , r17 = 00000000 , r18 = c01ef400 , r19 = 00000000 r20 = b99d0000 , r21 = 00827711 , r22 = 00027711 , r23 = 0001b76f r28 = 40000000 , r29 = 00000002 , r30 = 00000000 , r31 = 00005711 cr = 82422080 , xer = ???????? , cia = 00143fd0 lr = 0005e640 , ctr = 00000000 , dec = f375cd15 msr = 00009032 , dar = f0000820 , dsisr = 40000000 srr0 = d015f868 , srr1 = 0000d030 tbu = 00002801 , tbl = 16ae457d , ear = 00000000 pvr = ???????? , pid = ???????? , sdr1 = 010000ff hid0 = f001c084 , hid1 = ???????? , hid2 = ???????? hid5 = ???????? , mq = ???????? sprg0 = 000e5500 , sprg1 = 00000000 sprg2 = e6005600 , sprg3 = 00437eb0 sr0 = 00000000 , sr1 = 00020010 , sr2 = 0001bd0c , sr3 = 00000000 sr4 = 0000c006 , sr5 = 0000e007 , sr6 = 00010008 , sr7 = 00012009 sr8 = 0001400a , sr9 = 0001600b , sr10 = 007fffff , sr11 = 00002001 sr12 = 0001b76f , sr13 = 00827711 , sr14 = 00008004 , sr15 = 007fffff 604E_ #1 r0 = 00000006 , r1 = 2ff3b2e8 , r2 = 0026ae98 , r3 = 00114a84 r4 = 00000000 , r5 = 00000008 , r6 = 0036f680 , r7 = 00000000 r8 = 00000003 , r9 = deadbeef , r10 = deadbeef , r11 = 00000000 r16 = deadbeef , r17 = deadbeef , r18 = deadbeef , r19 = deadbeef r20 = deadbeef , r21 = deadbeef , r22 = deadbeef , r23 = deadbeef r24 = deadbeef , r25 = deadbeef , r26 = 0010a050 , r27 = 000034e0 r28 = 00000000 , r29 = 00000001 , r30 = 0036f900 , r31 = 000e5b14 cr = 24004000 , xer = ???????? , cia = 000254c4 lr = 00025918 , ctr = 00025878 , dec = ce4b0544 msr = 00009032 , dar = f0060fa8 , dsisr = 40000000 srr0 = 000258c8 , srr1 = 00009032 tbu = 00002801 , tbl = 3bd98cfd , ear = 00000000 pvr = ???????? , pid = ???????? , sdr1 = 010000ff hid0 = f001c084 , hid1 = ???????? , hid2 = ???????? hid5 = ???????? , mq = ???????? sprg0 = 000e5880 , sprg1 = deadbeef sprg2 = e6000300 , sprg3 = 2ff3b400 sr0 = 00000000 , sr1 = 00020010 , sr2 = 00000000 , sr3 = 00022011 sr4 = 007fffff , sr5 = 007fffff , sr6 = 007fffff , sr7 = 007fffff sr8 = 00000000 , sr9 = 007fffff , sr10 = 007fffff , sr11 = 007fffff sr12 = 007fffff , sr13 = 00000000 , sr14 = 00008004 , sr15 = 007fffff 604E_ #2 r4 = 0037a194 , r5 = 00000008 , r6 = 0036f680 , r7 = 00000000 r8 = 00000003 , r9 = deadbeef , r10 = deadbeef , r11 = 00000000 r12 = 00009032 , r13 = deadbeef , r14 = deadbeef , r15 = deadbeef r16 = deadbeef , r17 = deadbeef , r18 = deadbeef , r19 = deadbeef r20 = deadbeef , r21 = deadbeef , r22 = deadbeef , r23 = deadbeef r24 = deadbeef , r25 = deadbeef , r26 = 0010a050 , r27 = 000034e0 r28 = 0037969e , r29 = 00000002 , r30 = 0036fb80 , r31 = 000e5e94 cr = 22002000 , xer = ???????? , cia = 00025900 lr = 00025918 , ctr = 00025878 , dec = a91f2233 msr = 00009032 , dar = f005e634 , dsisr = 40000000 srr0 = 000254a0 , srr1 = 00009032 tbu = 00002801 , tbl = 6105f1e0 , ear = 00000000 pvr = ???????? , pid = ???????? , sdr1 = 010000ff hid0 = f001c084 , hid1 = ???????? , hid2 = ???????? hid5 = ???????? , mq = ???????? sprg0 = 000e5c00 , sprg1 = deadbeef sprg2 = e6000400 , sprg3 = 2ff3b400 sr0 = 00000000 , sr1 = 00020010 , sr2 = 00000000 , sr3 = 00022011 sr4 = 007fffff , sr5 = 007fffff , sr6 = 007fffff , sr7 = 007fffff sr8 = 007fffff , sr9 = 007fffff , sr10 = 007fffff , sr11 = 007fffff sr12 = 007fffff , sr13 = 007fffff , sr14 = 00008004 , sr15 = 007fffff 604E_ #3 r0 = 00009032 , r1 = 00458c48 , r2 = 0026ae98 , r3 = d000c000 r4 = 00000020 , r5 = 00000000 , r6 = 00000020 , r7 = d000cfc0 r8 = 00000020 , r9 = 00000005 , r10 = 00037999 , r11 = 00000000 r12 = 00000400 , r13 = 00458eb0 , r14 = 2ff3b400 , r15 = 000e5f80 r16 = 00000000 , r17 = b0ba9e28 , r18 = c02f2830 , r19 = 00005999 r20 = b99d0000 , r21 = 00828cf4 , r22 = 00028cf4 , r23 = 00037999 r24 = 00000002 , r25 = 40ec6f78 , r26 = 00ec6f78 , r27 = b0a14010 r28 = 40000000 , r29 = 00000002 , r30 = 0000000c , r31 = 00000cf4 cr = 82422080 , xer = ???????? , cia = 00143fc4 lr = 0005e640 , ctr = 00000001 , dec = 83ee9aac msr = 00009032 , dar = 2000c63c , dsisr = 40000000 srr0 = 10010ba8 , srr1 = 0000f030 tbu = 00002801 , tbl = 8633eedc , ear = 00000000 pvr = ???????? , pid = ???????? , sdr1 = 010000ff hid0 = f001c084 , hid1 = ???????? , hid2 = ???????? hid5 = ???????? , mq = ???????? sprg0 = 000e5f80 , sprg1 = 00000000 sprg2 = e600c200 , sprg3 = 00458eb0 sr0 = 00000000 , sr1 = 00020010 , sr2 = 00028cf4 , sr3 = 00022011 sr4 = 0000c006 , sr5 = 0000e007 , sr6 = 00010008 , sr7 = 00012009 sr12 = 00037999 , sr13 = 00828cf4 , sr14 = 00008004 , sr15 = 007fffff 604E_ #4 r0 = 20000000 , r1 = 2ff3b2f0 , r2 = 0026ae98 , r3 = 2001209f r4 = 00000000 , r5 = e6005a00 , r6 = 00005a65 , r7 = 00000000 r8 = 6003a01d , r9 = 00000001 , r10 = 00000000 , r11 = 001e2bb4 r12 = 001e30c4 , r13 = deadbeef , r14 = 00000003 , r15 = 2ff22cdc r16 = 2ff22cec , r17 = 00000000 , r18 = deadbeef , r19 = deadbeef r20 = deadbeef , r21 = deadbeef , r22 = ffffffff , r23 = 2ff22d6a r24 = 2ff3b6e0 , r25 = 2ff3b400 , r26 = d016aed4 , r27 = 22222422 r28 = 00002000 , r29 = 00000001 , r30 = 200110a0 , r31 = 2ff3b400 cr = 2a222422 , xer = ???????? , cia = 001e3194 lr = 001e30d4 , ctr = 001e2bb4 , dec = 5ebf97a4 msr = 00009032 , dar = f0004358 , dsisr = 40000000 srr0 = 000037d8 , srr1 = 0000d032 tbu = 00002801 , tbl = ab637455 , ear = 00000000 pvr = ???????? , pid = ???????? , sdr1 = 010000ff hid0 = f001c084 , hid1 = ???????? , hid2 = ???????? hid5 = ???????? , mq = ???????? sprg0 = 000e6300 , sprg1 = 2ff22cdc sr0 = 00000000 , sr1 = 00020010 , sr2 = 000379b9 , sr3 = 00022011 sr4 = 007fffff , sr5 = 007fffff , sr6 = 007fffff , sr7 = 0001a00d sr8 = 007fffff , sr9 = 007fffff , sr10 = 007fffff , sr11 = 007fffff sr12 = 007fffff , sr13 = 60000020 , sr14 = 00008004 , sr15 = 007fffff 604E_ #5 r0 = 00009032 , r1 = 0046ec48 , r2 = 0026ae98 , r3 = dff22000 r4 = 00000020 , r5 = 00000400 , r6 = 00000020 , r7 = dff22bc0 r8 = 00000020 , r9 = 00000005 , r10 = 000155e8 , r11 = 00004002 r12 = 00000400 , r13 = 0046eeb0 , r14 = 2ff3b400 , r15 = 000e6680 r16 = 00000000 , r17 = b0b96814 , r18 = c039b488 , r19 = 000055e8 r20 = b99d0000 , r21 = 00821e91 , r22 = 00021e91 , r23 = 000155e8 r24 = 00000002 , r25 = 40ec7410 , r26 = 00ec7410 , r27 = b0b18794 r28 = 40000000 , r29 = 00000002 , r30 = 0000ff22 , r31 = 00003e91 cr = 82422080 , xer = ???????? , cia = 00143fc4 lr = 0005e640 , ctr = 00000009 , dec = 398e23dd msr = 00009032 , dar = 2ff22ff8 , dsisr = 40000000 srr0 = d0177b8c , srr1 = 0000d030 tbu = 00002801 , tbl = d09569c3 , ear = 00000000 pvr = ???????? , pid = ???????? , sdr1 = 010000ff hid5 = ???????? , mq = ???????? sprg0 = 000e6680 , sprg1 = 00000000 sprg2 = e6006900 , sprg3 = 0046eeb0 sr0 = 00000000 , sr1 = 00020010 , sr2 = 00021e91 , sr3 = 00022011 sr4 = 0000c006 , sr5 = 0000e007 , sr6 = 00010008 , sr7 = 00012009 sr8 = 0001400a , sr9 = 0001600b , sr10 = 007fffff , sr11 = 00000000 sr12 = 000155e8 , sr13 = 00821e91 , sr14 = 00008004 , sr15 = 007fffff ---------------------------------------------------------------------------- --- IO Controller Scan information IO Controller 0 csr[000]=0x10000000 csr[001]=0x10000000 csr[002]=0x10000000 csr[003]=0x10000000 csr[004]=0x10000000 csr[005]=0x00000000 csr[006]=0x10000000 csr[007]=0x10000000 csr[008]=0x10000000 csr[009]=0x10000000 csr[010]=0x10000000 csr[011]=0x10000000 csr[012]=0x10000000 csr[013]=0x10000000 csr[014]=0x10000000 csr[015]=0x00000000 dsc[000]=0x00000000 dsc[001]=0x00000000 dsc[002]=0x00000000 dsc[003]=0x00000000 dsc[004]=0x00000000 dsc[005]=0x00000000 dsc[006]=0x00000000 dsc[007]=0x00000000 dsc[008]=0x00000000 dsc[012]=0x00000000 dsc[013]=0x00000000 dsc[014]=0x00000000 dsc[015]=0x00000000 m_enable=0x1 personalization=0x30030000 bus_stat=0x00 tce_addr_high=0x5fd00 tce_addr_low=0x4 bus_mapping=0xffffffff crr=0xffff0007 mode0=0xf60029f4 mode1=0xf0000000 j_pio_s_m=0x0 k_s_m=0x0 l_s_m=0x0 read_s_m=0x0 write_s_m=0x0 d_s_m=0x0 e_s_m=0x0 f_s_m=0x0 pio_inprogress=0x0 pio_read_nw=0x0 pio_byte_count=0x1f pio_completion=0x0 pio_error=0x0 pio_addr_high=0x50060000 cl_addr_high=0x2 cl_addr_low=0x439007e0 dsier_add_l=0x2 sh_addr_h=0x90000000 sh_addr_l=0x439fef20 IO Controller 1 csr[000]=0xcb8e8a80 csr[001]=0x9f4d8444 csr[002]=0x0454dc76 csr[003]=0x24d04400 csr[004]=0x2a8a4c99 csr[005]=0x90c88088 csr[006]=0x54c4cd04 csr[007]=0x04400c82 csr[008]=0xda414380 csr[009]=0x79c8828f csr[010]=0xd0c30100 csr[011]=0xb0c28644 csr[012]=0x2e38a000 csr[013]=0x843c3004 csr[014]=0x365800dd csr[015]=0x1c182000 dsc[000]=0x2848e425 dsc[001]=0x38ede1a5 dsc[002]=0x08cdc104 dsc[003]=0x394ce484 dsc[004]=0x048c8858 dsc[005]=0x02c6c632 dsc[006]=0x1684c808 dsc[007]=0x848cd01e dsc[008]=0x1cccc008 dsc[009]=0x4ccc4000 dsc[010]=0x048cc841 dsc[011]=0x0ccc4000 dsc[012]=0xeccc8804 dsc[013]=0x44cc8800 dsc[014]=0x64cc8804 m_enable=0x1 personalization=0x304b2a00 bus_stat=0x21 tce_addr_high=0x5fc00 tce_addr_low=0x4 bus_mapping=0xffffffff crr=0xffff0007 mode0=0xf6002bf4 mode1=0xf6000000 j_pio_s_m=0x0 k_s_m=0x0 l_s_m=0x0 read_s_m=0x0 write_s_m=0x0 d_s_m=0x0 e_s_m=0x0 f_s_m=0x0 pio_inprogress=0x0 pio_read_nw=0x0 pio_byte_count=0x1d pio_completion=0x0 pio_error=0x0 pio_addr_high=0xd977f0f4 cl_addr_high=0xb cl_addr_low=0x07400000 dsier_add_l=0x4 sh_addr_h=0x909002e8 sh_addr_l=0x7b9fef20 ===================================== ********************************************************************** The information in this E-Mail is confidential and may be legally privileged. It may not represent the views of Scottish and Southern Energy plc. It is intended solely for the addressees. Access to this E-Mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any unauthorised recipient should advise the sender immediately of the error in transmission. Scottish Hydro-Electric, Southern Electric, SWALEC and S+S are trading names of the Scottish and Southern Energy Group. **********************************************************************
This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:09 EDT