Memory group (A0) failed on V880 CPU/mem board

From: Stoyan Genov (stoyan.genov@gbservices.biz)
Date: Mon Feb 20 2006 - 16:00:49 EST


Good day,

A fully-equipped V880 (8 x CPU @ 1.2GHz, 4 boards, 64GB RAM),
spontaneously and irregularly restarted a couple of times.
Logs from two days ago showed soft memory error on Slot D, J8101.

After the restarts, it showed errors in this bank no more, but reported
all banks in the required group A0 (J3000, J3001, J2900, J2901) with
hard errors. The machine is configured to restart on hardware errors
(error-reset-recovery=boot in eeprom), so I believe restarts are normal
given the errors and the configuration.

I have asr-disable'd cpu5 and cpu7, thus cutting off access to this
board and its memory.

I have the chance to swap the reported as faulty DIMMs in the next hour.

What itches me:

Am I too paranoic to think that simultaneous fault of all DIMMs in one
group is not actually problems with the DIMMs?

Is it possible that the board is faulty?

Is it possible that another failed DIMM (J8101) is actually causing the
trouble?

Any comments and advice are welcome. I will summarize.

Best Regards,
Stoyan Genov
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:39:01 EDT