V880 memory error, how to react

From: Stoyan Genov (stoyan.genov@sun-fish.com)
Date: Sun May 30 2004 - 11:04:34 EDT


Hi,

I have one Sun Fire V880, 4 CPU/mem boards, 8 CPUs x 900Mhz,
16 GB RAM (4 boards x 16 mem slots x 256MB DIMMs)

The system runs an oracle database server under a decent load
(uses all memory).

Since a couple of days I have this in /var/adm/messages, and
it's reported approximately twice per minute:

May 30 16:38:56 v880server SUNW,UltraSPARC-III: [ID 354446 kern.info] [AFT0] errID 0x001a85ab.36a898c4 Corrected Memory Error on Slot D: J8201 is Sticky
May 30 16:38:56 v880server SUNW,UltraSPARC-III: [ID 220268 kern.info] [AFT0] errID 0x001a85ab.36a898c4 Data Bit 122 was in error and corrected
May 30 16:38:56 v880server unix: [ID 752700 kern.warning] WARNING: [AFT0] Sticky Softerror encountered on Memory Module Slot D: J8201

(message is actually longer per one report, but I think the part
above shows the problem)

As far as I understand (and as far as I found info through google),
this is a soft (ECC-correctable) memory error in memory bank J8201 in
slot D;

I have the following questions:

Part 1: Diagnosis.
1. Am I right? Is this really an ECC-correctable memory error?
2. Is slot D the last (topmost) CPU/memory slot?
3. If I have someone (I'm thousands of miles away from the machine)
   open the machine, take off and open the memory board,
   will he find J8201 written somewhere (so he can spot exactly
   the faulty memory chip)?

Part 2: How to react?
1. Is it correct that it's possible to remove the group of four memory DIMMs
    in which the offending chip is, and plug back the CPU/memory board
    so the CPUs and the rest of memory are used again?
2. Is it possible that I switch off somehow usage of this group of DIMMs
    or the entire CPU/memory board from the openboot environment,
    so that no physical intervention is required until we get the replacement
    DIMMs?
3. What would you do in this situation, considering that downtime is possible,
    but highly undesireable?

Thank you in advance!

Best Regards,
Stoyan Genov

P.S. I can post more lines from /var/adm/messages, if this is required.
Thank you once again.

--sdg
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:28:45 EDT