memory problems (Ultra Enterprise 4000)

From: Grigory Nikonov (ecat@land.ru)
Date: Fri Jan 24 2003 - 03:11:54 EST


Hi everybody,

we've been having problems with our E4000:
at first on Dec 18 we've got the following lines in /var/adm/messages

Dec 18 02:54:45 Effa SUNW,UltraSPARC-II: [ID 336084 kern.notice] [AFT0] Corrected Memory Error on CPU1, errID 0x0005b3a8.62e7e018
Dec 18 02:54:45 Effa AFSR 0x00000000.00100000<CE> AFAR 0x00000000.6af287e8
Dec 18 02:54:45 Effa AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10023d1c
Dec 18 02:54:45 Effa UDBL Syndrome 0xa4 Memory Module Board 2 J3201
Dec 18 02:54:45 Effa SUNW,UltraSPARC-II: [ID 386124 kern.notice] [AFT0] errID 0x0005b3a8.62e7e018 Corrected Memory Error on Board 2 J3201 is Persistent
Dec 18 02:54:45 Effa SUNW,UltraSPARC-II: [ID 392052 kern.notice] [AFT0] errID 0x0005b3a8.62e7e018 ECC Data Bit 19 was in error and corrected

after reboot the server turned off the entire bank where the faulty
dimm was located thus leaving only 11Gb ram instead of 12Gb.
I've put the memory module from J3201 to J3500 and got error messages,
that now J3500 was faulty. So I got a replacement and everything was
fine. What I've got today is

Jan 21 13:28:42 Effa SUNW,UltraSPARC-II: [ID 103143 kern.notice] [AFT0] Corrected Memory Error on CPU10, errID 0x000535f0.a150c558
Jan 21 13:28:42 Effa AFSR 0x00000000.00100000<CE> AFAR 0x00000002.cdb2cf78
Jan 21 13:28:42 Effa AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10023d2c
Jan 21 13:28:42 Effa UDBL Syndrome 0x2 Memory Module Board 5 J3501
Jan 21 13:28:42 Effa SUNW,UltraSPARC-II: [ID 295699 kern.notice] [AFT0] errID 0x000535f0.a150c558 Corrected Memory Error on Board 5 J3501 is Persistent
Jan 21 13:28:42 Effa SUNW,UltraSPARC-II: [ID 277393 kern.notice] [AFT0] errID 0x000535f0.a150c558 ECC Check Bit 1 was in error and corrected

[skipped some more lines of the same]

Jan 22 13:58:35 Effa AFSR 0x00000000.00100000<CE> AFAR 0x00000002.edb2fd78
Jan 22 13:58:35 Effa AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10023d2c
Jan 22 13:58:35 Effa UDBL Syndrome 0x2 Memory Module Board 5 J3501
Jan 22 13:58:35 Effa SUNW,UltraSPARC-II: [ID 124781 kern.notice] [AFT0] errID 0x00058626.d37a519e Corrected Memory Error on Board 5 J3501 is Persistent
Jan 22 13:58:35 Effa SUNW,UltraSPARC-II: [ID 857093 kern.notice] [AFT0] errID 0x00058626.d37a519e ECC Check Bit 1 was in error and corrected

Notice, that now it's Board 5, not Board 2 as in previous log.

Any idea on what might cause such behavoir and if I should replace the
dimm or keep it ?

Thanks in advance.

-- 
Grigory Nikonov, systems administrator,
Renaissance Insurance, Moscow, Russia.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:25:42 EDT