E4500 Memory Errors?

From: Tim Evans (tkevans@tkevans.com)
Date: Sat Aug 07 2004 - 11:18:08 EDT


Our E-4500 has crashed several times in the past week. Messages file contains:

Aug 4 08:32:03 mail SUNW,UltraSPARC-II: [ID 939439 kern.warning] WARNING: [AFT
1] Uncorrectable Memory Error on CPU4 at TL=0, errID 0x00000e1c.518c9f30
Aug 4 08:32:03 mail AFSR 0x00000001<ME>.80300000<PRIV,UE,CE> AFAR 0x000000
00.00157410
Aug 4 08:32:03 mail AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x1
1744b8
Aug 4 08:32:03 mail UDBH 0x0359<UE,CE> UDBH.ESYND 0x59 UDBL 0x02a6<UE> UDB
L.ESYND 0xa6
Aug 4 08:32:03 mail UDBH Syndrome 0x59 Memory Module Board 0 J3100 J3200 J
3300 J3400 J3500 J3600 J3700 J3800

There are a series of these sets of messages, but the CPU number referenced
varies. In addition, there are loads of these:

Aug 4 08:32:03 mail SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0
x00): 0x40f72484.00000000
Aug 4 08:32:03 mail SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0
x08): 0x403e2eac.00000000

And:

Aug 4 08:32:03 mail SUNW,UltraSPARC-II: [ID 526020 kern.warning] WARNING: [AFT
1] AFAR was derived from UE report, CP event on CPU9 (caused access error on CP
U4), errID 0x00000e1c.518c9f30

And:

Aug 4 08:32:04 mail UDBL Syndrome 0xa6 Memory Module Board 0 J3100 J3200 J
3300 J3400 J3500 J3600 J3700 J3800

These seem to suggest one or more bad memory boards, or DIMM's. Does the fact
the error refers to multiple CPU's may bear this out? Can anyone help me
identify what's involved before I pay for a Sun Field Engineer to come out?

(I do also have a crash dump.)

--
Tim Evans, TKEvans.com, Inc.	|    5 Chestnut Court
tkevans@tkevans.com		|    Owings Mills, MD 21117
http://www.tkevans.com/		|    443-394-3864
http://www.come-here.com/News/	|    
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:13 EDT