Ultra 2 uninitiated reboots

From: Christopher Ferry (chris@ampira.com)
Date: Tue Mar 04 2003 - 12:11:19 EST


All,

I have a Sparc Ultra 2 that is rebooting randomly due to uncorrectable
memory issues related to the CPUs. I've already replace one of the CPUs
2 weeks ago but we continue to see spontaneous reboots due to
CPUs/Memory Modules. Since the complaints are flip flopping(Technical
Term) from CPU0 to CPU1 and Mem Mods 601/701 602/702 and 604/704 I was
wondering if this might be a motherboard on its way out. The other
thing that I noticed is that the reboots are all caused by the
ultimatebb.cgi (commercial perl based bulletin board a cust runs)..
Could some perl code cause spontaneous reboots on an ultra2?(I don't
think so)
Any suggestions will be appreciated and summary will follow.

Here's the pertinent messages info:

reboot 1:
Feb 23 09:08:20 web-1 last message repeated 1 time
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 738973 kern.warning]
WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at TL=0,
errID 0x00008ca9.7e91b06f
Feb 23 09:08:20 web-1 AFSR 0x00000000.00200000<UE> AFAR
0x00000000.35c874f8
Feb 23 09:08:20 web-1 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0xff2e3b38
Feb 23 09:08:20 web-1 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203<UE>
UDBL.ESYND 0x03
Feb 23 09:08:20 web-1 UDBL Syndrome 0x3 Memory Module U0702 U0602
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 760758 kern.warning]
WARNING: [AFT1] errID 0x00008ca9.7e91b06f Syndrome 0x3 indicates that
this may not be a memory module problem
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 488341 kern.info] [AFT2]
errID 0x00008ca9.7e91b06f PA=0x00000000.35c874f8
Feb 23 09:08:20 web-1 E$tag 0x00000000.1cc006b9 E$State: Exclusive
E$parity 0x0e
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x00): 0x00000021.00000000
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x08): 0x002ef4f0.00000000
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x10): 0xff2e3b38.00000000
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x18): 0x00870000.b6020000
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x20): 0x002ef4a0.002ef478
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x28): 0x00000021.00000000
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x30): 0x002f0a38.002f0a38
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
E$Data (0x38): 0xbf2dfdfc.00000000 *Bad* PSYND=0x00ff
Feb 23 09:08:20 web-1 unix: [ID 321153 kern.notice] NOTICE: Scheduling
clearing of error on page 0x00000000.35c86000
Feb 23 09:08:20 web-1 SUNW,UltraSPARC-II: [ID 612998 kern.info] [AFT3]
errID 0x00008ca9.7e91b06f Above Error is in User Mode
Feb 23 09:08:20 web-1 and is fatal: will reboot
Feb 23 09:08:20 web-1 unix: [ID 855177 kern.warning] WARNING: [AFT1]
initiating reboot due to above error in pid 26427 (ultimatebb.cgi)
Feb 23 09:08:24 web-1 unix: [ID 221039 kern.notice] NOTICE: Previously
reported error on page 0x00000000.35c86000 cleared

Reboot 2:
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 263651 kern.warning]
WARNING: [AFT1] Uncorrectable Memory Error on CPU1 Data access at TL=0,
errID 0x000001fc.42827777
Feb 23 09:45:15 web-1 AFSR 0x00000000.00200000<UE> AFAR
0x00000000.627305b8
Feb 23 09:45:15 web-1 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0xff2e1a20
Feb 23 09:45:15 web-1 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203<UE>
UDBL.ESYND 0x03
Feb 23 09:45:15 web-1 UDBL Syndrome 0x3 Memory Module U0704 U0604
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 805099 kern.warning]
WARNING: [AFT1] errID 0x000001fc.42827777 Syndrome 0x3 indicates that
this may not be a memory module problem
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 886959 kern.info] [AFT2]
errID 0x000001fc.42827777 PA=0x00000000.627305b8
Feb 23 09:45:15 web-1 E$tag 0x00000000.1ec00c4e E$State: Exclusive
E$parity 0x0f
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x00): 0x00000021.00000000
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x08): 0x002b9388.002b9388
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x10): 0xff2e1a20.00000007
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x18): 0x002574c3.45400000
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x20): 0x002b9320.002b91b0
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x28): 0x00000021.00000000
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x30): 0x002b7ea8.002b7ea8
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
E$Data (0x38): 0xbf2cc55c.00000008 *Bad* PSYND=0x00ff
Feb 23 09:45:15 web-1 unix: [ID 321153 kern.notice] NOTICE: Scheduling
clearing of error on page 0x00000000.62730000
Feb 23 09:45:15 web-1 SUNW,UltraSPARC-II: [ID 804769 kern.info] [AFT3]
errID 0x000001fc.42827777 Above Error is in User Mode
Feb 23 09:45:15 web-1 and is fatal: will reboot
Feb 23 09:45:15 web-1 unix: [ID 855177 kern.warning] WARNING: [AFT1]
initiating reboot due to above error in pid 2641 (ultimatebb.cgi)
Feb 23 09:45:18 web-1 unix: [ID 221039 kern.notice] NOTICE: Previously
reported error on page 0x00000000.62730000 cleared

Reboot 3:
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 637986 kern.warning]
WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at TL=0,
errID 0x0000a33c.b85029a1
Feb 25 11:37:11 web-1 AFSR 0x00000000.00200000<UE> AFAR
0x00000000.0032f4f8
Feb 25 11:37:11 web-1 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0xff2e3b38
Feb 25 11:37:11 web-1 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203<UE>
UDBL.ESYND 0x03
Feb 25 11:37:11 web-1 UDBL Syndrome 0x3 Memory Module U0701 U0601
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 807870 kern.warning]
WARNING: [AFT1] errID 0x0000a33c.b85029a1 Syndrome 0x3 indicates that
this may not be a memory module problem
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 987189 kern.info] [AFT2]
errID 0x0000a33c.b85029a1 PA=0x00000000.0032f4f8
Feb 25 11:37:11 web-1 E$tag 0x00000000.1ec00006 E$State: Exclusive
E$parity 0x0f
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x00): 0x00000021.00000000
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x08): 0x002ef4f0.00000000
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x10): 0xff2e3b38.00000000
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x18): 0x00870000.b6020000
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x20): 0x002ef4a0.002ef478
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x28): 0x00000021.00000000
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x30): 0x002f0a38.002f0a38
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
E$Data (0x38): 0xbf2dfdfc.00000000 *Bad* PSYND=0x00ff
Feb 25 11:37:11 web-1 unix: [ID 321153 kern.notice] NOTICE: Scheduling
clearing of error on page 0x00000000.0032e000
Feb 25 11:37:11 web-1 SUNW,UltraSPARC-II: [ID 776193 kern.info] [AFT3]
errID 0x0000a33c.b85029a1 Above Error is in User Mode
Feb 25 11:37:11 web-1 and is fatal: will reboot
Feb 25 11:37:11 web-1 unix: [ID 855177 kern.warning] WARNING: [AFT1]
initiating reboot due to above error in pid 12941 (ultimatebb.cgi)
Feb 25 11:37:14 web-1 unix: [ID 221039 kern.notice] NOTICE: Previously
reported error on page 0x00000000.0032e000 cleared

Christopher Ferry
Sr Systems Administrator
FortuneCity.com Inc.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:25:55 EDT