Panics - Memory Errors - UDBH Syndromes

From: Steve (sun@sbtt.net)
Date: Thu Aug 07 2003 - 11:57:56 EDT


Ultra 10
Solaris 8 (Cluster Patch 9/18/2002)
OpenBoot 3.25

1gb Dataram memory
1 CPU @ 440mhz

For about the last month or so the system would panic and reboot.

There errors are common almost every other day or so(on different dimms):

Aug 7 10:44:08 njuxwp02 SUNW,UltraSPARC-IIi: [ID 235719 kern.info]
[AFT0] Corrected Memory Error detected by CPU0, errID 0x000000e8.74deda91
Aug 7 10:44:08 njuxwp02 AFSR 0x00000000.00100000<CE> AFAR
0x00000000.3122a890
Aug 7 10:44:08 njuxwp02 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0xff3bfb70
Aug 7 10:44:08 njuxwp02 UDBH Syndrome 0x80 Memory Module DIMM3
Aug 7 10:44:08 njuxwp02 SUNW,UltraSPARC-IIi: [ID 632111 kern.info]
[AFT0] errID 0x000000e8.74deda91 Corrected Memory Error on DIMM3 is
Intermittent
Aug 7 10:44:08 njuxwp02 SUNW,UltraSPARC-IIi: [ID 142208 kern.info]
[AFT0] errID 0x000000e8.74deda91 ECC Check Bit 7 was in error and corrected

RMA the memory - no difference same errors
Switch the CPU from another u10 - no difference same errors
Switched drives to another u10 - no difference

Only thing I got left is the HD themselves or something on the OS or
software. We running BEA Weblogic.

I've seen a few post in the archives of the same problem but without any
answers, have no contract with SUN on this old hardware so I'm not which
way to go with this.

I've been looking at the dumps with mdb but not sure what to look for..

thanks in advance for any tips.

-s
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:54 EDT