Corrected Memory Error

From: Nick Pettefar (pettefar@gmail.com)
Date: Tue Dec 06 2005 - 05:45:07 EST


Hi Unix experts, etc.,

I get the following /var/adm/message, or a close variant, every month or so
on one of our critical servers. Sun have changed the memory and the CPU of
this machine but it still persists. Should I be worried? Do you have any
idea what might cause it (other than the obvious which has been catered to
by Sun) ? This server needs to survive for a few more months before being
replaced.

Dec 4 10:13:08 bs4 unix: [AFT0] check_ecc: Dumping captured error states
...
Dec 4 10:13:08 bs4 unix: AFSR 0x00000000.00100000<CE> AFAR
0x00000000.501bacd0
Dec 4 10:13:08 bs4 unix: UDBH 0x0131<CE> UDBH.ESYND 0x31 UDBL 0x002c
UDBL.ESYND 0x2c
Dec 4 10:13:08 bs4 unix: [AFT0] Corrected Memory Error on CPU1, errID
0x002e4b44.a6e62688
Dec 4 10:13:08 bs4 unix: AFSR 0x00000000.00100000<CE> AFAR
0x00000000.501bacd0
Dec 4 10:13:08 bs4 unix: AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0x10021050
Dec 4 10:13:08 bs4 unix: UDBH Syndrome 0x31 Memory Module U0503
Dec 4 10:13:08 bs4 unix: [AFT0] errID 0x002e4b44.a6e62688 Corrected Memory
Error on U0503 is Persistent
Dec 4 10:13:08 bs4 unix: [AFT0] errID 0x002e4b44.a6e62688 ECC Data Bit 0
was in error and corrected

Here is another one:-

Nov 19 00:55:23 bs4 unix: [AFT0] Corrected Memory Error on CPU1, errID
0x0029921e.adec8f6d
Nov 19 00:55:23 bs4 unix: AFSR 0x00000000.00100000<CE> AFAR
0x00000000.75567f70
Nov 19 00:55:23 bs4 unix: AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0x1000bf9c
Nov 19 00:55:23 bs4 unix: UDBH Syndrome 0x62 Memory Module U0604
Nov 19 00:55:23 bs4 unix: [AFT0] errID 0x0029921e.adec8f6d Corrected Memory
Error on U0604 is Intermittent
Nov 19 00:55:23 bs4 unix: [AFT0] errID 0x0029921e.adec8f6d ECC Data Bit 36
was in error and corrected

$ prtconf|more
System Configuration: Sun Microsystems sun4u
Memory size: 2048 Megabytes

$ prtdiag|more
System Configuration: Sun Microsystems sun4u Sun Ultra 2 UPA/SBus
(UltraSPARC 200MHz)
System clock frequency: 100 MHz
Memory size: 2048 Megabytes

========================= CPUs =========================

                    Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
 0 1 1 200 1.0 US-I 4.0

========================= IO Cards =========================

     Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---- --------------------------------
----------------------
 0 SBus 25 14 SUNW,fas/sd
(block)
 0 SBus 25 14
SUNW,hme
 0 SBus 25 14
SUNW,bpp
 0 UPA 100 30 FFB, Single Buffered SUNW,501-2634

No failures found in System
===========================

$ uname -a
SunOS bs4 5.6 Generic_105181-39 sun4u sparc SUNW,Ultra-2

--
Nick@Pettefar.com   DoD 1069  MAG 73516  Bros 650  ZZR1100D  R90s  Z88s
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:37:25 EDT