Summarise: Uncorrectable Memory Error on CPU0 Data access

From: rajdeep (rajdeep@noida.atrenta.com)
Date: Tue May 06 2003 - 09:29:18 EDT


Hi Managers,

Thanks to all who have responded... The problem was rightly pointed out
by willie, so special thanks to him.
What I have done is , I have swaped the 0 bank memory with bank 3 module
and then I found that the error was coming on "Memory Module U1404 U0404
U1403 U0403"...instead of "Memory Module U1302 U0302 U1301 U0301"...
Then I removed those four modules .. And now though my machine is having
3GB but the reboot problem is gone...

Just out of interest ..Can someone give some details of why memeory
modules develops these type of fault??

Thanks

rajdeep wrote:

> Hi managers,
> I have a ultra 80 machine with 4gb memeory.. Recently it is rebooting
> 3-4 times a day leaving error message in /var/adm/messages. I am not
> able to diagonise whether the problem is in memeory or CPU.. Could you
> pls. help me out with some suggestion that what exactly went wrong in
> my machine...
>
> I will summarise once it is rectified...
>
> Thanks in advance
>
> The error message is as follows:----
> May 4 03:16:28 ace SUNW,UltraSPARC-II: [ID 447316 kern.warning]
> WARNING: [AFT1] Uncorrectable
> Memory Error on CPU0 Data access at TL=0, errID 0x00001290.fa75454c
> May 4 03:16:28 ace AFSR 0x00000000.80b0ff00<PRIV,WP,UE,CE> AFAR
> 0x00000000.c391c200
> May 4 03:16:28 ace AFSR.PSYND 0xff00(Score 05) AFSR.ETS 0x00
> Fault_PC 0x10143394
> May 4 03:16:28 ace UDBH 0x03a0<UE,CE> UDBH.ESYND 0xa0 UDBL 0x0000
> UDBL.ESYND 0x00
> May 4 03:16:28 ace UDBH Syndrome 0xa0 Memory Module U1302 U0302
> U1301 U0301
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 607763 kern.info] [AFT2]
> errID 0x00001290.fa75454c
> PA=0x00000000.c391c200
> May 4 03:16:29 ace E$tag 0x00000000.1ec01872 E$State: Exclusive
> E$parity 0x0f
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
> E$Data (0x00): 0xba7dcafe
> .baddcafe *Bad* PSYND=0xff00
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
> E$Data (0x08): 0xbaddcafe
> .baddcafe
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
> E$Data (0x10): 0xbaddcafe
> .baddcafe
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
> E$Data (0x18): 0xbaddcafe
> .baddcafe
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
> E$Data (0x20): 0xbaddcafe
> .baddcafe
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
> E$Data (0x28): 0xbaddcafe
> .baddcafe
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
> E$Data (0x30): 0xbaddcafe
> .baddcafe
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
> E$Data (0x38): 0xbaddcafe
> .00000000
> May 4 03:16:29 ace SUNW,UltraSPARC-II: [ID 564090 kern.warning]
> WARNING: [AFT1] WP event on C
> PU0, errID 0x00001290.fa75454c
> May 4 03:16:29 ace AFSR 0x00000000.80b0ff00<PRIV,WP,UE,CE> AFAR
> 0x00000000.c391c200
> May 4 03:16:29 ace AFSR.PSYND 0xff00(Score 05) AFSR.ETS 0x00
> Fault_PC 0x10143394
> May 4 03:16:29 ace UDBH 0x03a0<UE,CE> UDBH.ESYND 0xa0 UDBL 0x0000
> UDBL.ESYND 0x00
> May 4 03:16:29 ace unix: [ID 836849 kern.notice]
> May 4 03:16:29 ace ^Mpanic[cpu0]/thread=30001e94880:
> May 4 03:16:29 ace unix: [ID 880381 kern.notice] [AFT1] errID
> 0x00001290.fa75454c UE Error(s)
> May 4 03:16:29 ace See previous message(s) for details
> May 4 03:16:29 ace unix: [ID 100000 kern.notice]
> May 4 03:16:29 ace genunix: [ID 723222 kern.notice] 000002a1003a4840
> SUNW,UltraSPARC-II:cpu_a
> flt_log+4e0 (2a1003a48fe, 1, 101484e0, 2a1003a4a88, 2a1003a494b,
> 10148508)
> May 4 03:16:30 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000030000e3de10 000002a1003a4b5
> 0 0000000000000003 0000000000000010
> May 4 03:16:30 ace %l4-7: 0000030005cc0000 00000310048ab540
> 0000031001b54b18 00000310048ab5
> 40
> May 4 03:16:30 ace genunix: [ID 723222 kern.notice] 000002a1003a4a90
> SUNW,UltraSPARC-II:cpu_a
> sync_error+868 (104598b0, 2a1003a4b50, 80b0ff00, 0, 640074080b0ff00,
> 2a1003a4d10)
> May 4 03:16:30 ace genunix: [ID 179002 kern.notice] %l0-3:
> 000000001040dae4 000000000000003
> 2 0000000000000000 00000000000003a0
> May 4 03:16:30 ace %l4-7: 00000000c391c200 0000000000400000
> 0000000000400000 00000000000000
> 01
> May 4 03:16:30 ace genunix: [ID 723222 kern.notice] 000002a1003a4c60
> unix:prom_rtt+0 (c391c20
> 0, 51c200, 400000, 0, 16, 14)
> May 4 03:16:30 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000000000000001 000000000000140
> 0 0000000080001400 000000001013fc94
> May 4 03:16:30 ace %l4-7: 000002a1003a5750 000002a1003a5630
> 000000000000000e 000002a1003a4d
> 10
> May 4 03:16:31 ace genunix: [ID 723222 kern.notice] 000002a1003a4db0
> SUNW,UltraSPARC-II:cpu_c
> e_scrub_mem_err+3c (6, 2a1003a4f10, 2a1003a4f10, 0, 1f0a0900, 0)
> May 4 03:16:31 ace genunix: [ID 179002 kern.notice] %l0-3:
> 000003000015de18 000000000000000
> 2 0000000000000001 000000001041b6f8
> May 4 03:16:31 ace %l4-7: 000000001041b338 0000000000000016
> 000000001041baf8 00000000000020
> 00
> May 4 03:16:31 ace genunix: [ID 723222 kern.notice] 000002a1003a4e60
> SUNW,UltraSPARC-II:cpu_c
> e_error+148 (2a1003a50b0, c391c200, 0, 2a, 0, 0)
> May 4 03:16:31 ace genunix: [ID 179002 kern.notice] %l0-3:
> 000000000000012a 000000000000012
> a 0000025400100000 0000000000100000
> May 4 03:16:31 ace %l4-7: 000000001044a3f0 000000001044a3d0
> 00000000104545c8 00000000000000
> 00
> May 4 03:16:32 ace genunix: [ID 723222 kern.notice] 000002a1003a5000
> unix:prom_rtt+0 (baddcaf
> e, 30005cc0000, 2000, ffffffffffffffff, 0, 0)
> May 4 03:16:32 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000000000000006 000000000000140
> 0 0000009980001605 000000001013f680
> May 4 03:16:32 ace %l4-7: 0000030fffef9308 0000000000000000
> 0000000000000000 000002a1003a50
> b0
> May 4 03:16:32 ace genunix: [ID 723222 kern.notice] 000002a1003a5150
> genunix:kmem_slab_create
> +118 (0, 30005cc0000, 2000, 3000001b680, 20, 0)
> May 4 03:16:32 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000000000000000 00000310001dda4
> 0 0000000000000000 0000000000000000
> May 4 03:16:32 ace %l4-7: 000002a7515aa000 000003000001b680
> 00000000f9a2e068 0000031001e2e0
> 68
> May 4 03:16:32 ace genunix: [ID 723222 kern.notice] 000002a1003a5240
> genunix:kmem_cache_alloc
> +180 (0, 0, 0, 3000001b680, 0, 0)
> May 4 03:16:33 ace genunix: [ID 179002 kern.notice] %l0-3:
> 000003000001ba00 000000000000200
> 0 00000300005247c0 0000000000002000
> May 4 03:16:33 ace %l4-7: 0000000000000001 000002a7515aa000
> 000003000004bf88 00000000000000
> 00
> May 4 03:16:33 ace genunix: [ID 723222 kern.notice] 000002a1003a52f0
> ufs:ufs_alloc_inode+c (3
> 0001f11c38, 1bc214, 20, 0, 300000272c0, 300000273f0)
> May 4 03:16:33 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000030000027640 0000030000165b6
> 0 0000030005cc20a8 0000000000000000
> May 4 03:16:33 ace %l4-7: 0000000010435218 000002a7515a8000
> 000003000004bf88 00000000000000
> 00
> May 4 03:16:33 ace genunix: [ID 723222 kern.notice] 000002a1003a53a0
> ufs:ufs_iget+1e0 (1bc214
> , 1bc214, 30000e47ce8, 30000e47ce8, 2a1003a5630, 300019814c0)
> May 4 03:16:33 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000030001f11c38 000000000000000
> 0 00000300019eda60 0000030000e35408
> May 4 03:16:33 ace %l4-7: 0000030001eee000 0000002000000007
> 00000300019814c0 00000000000010
> 00
> May 4 03:16:34 ace genunix: [ID 723222 kern.notice] 000002a1003a5460
> ufs:ufs_dirlook+728 (300
> 05cc20a8, 1bc214, 2a7515a8018, 200, 1, 30005cc21d8)
> May 4 03:16:34 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000000000000018 000000000000001
> 8 0000000000000000 0000030005cc2018
> May 4 03:16:34 ace %l4-7: 000002a1003a5750 000002a1003a5630
> 000002a7515a8000 00000000000000
> 00
> May 4 03:16:34 ace genunix: [ID 723222 kern.notice] 000002a1003a5570
> ufs:ufs_lookup+154 (0, 3
> 0000e47ce8, 2a1003a5748, 2a1003a5750, 30005cc2018, 0)
> May 4 03:16:34 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000000010161760 000000000000200
> 0 00000300005247c0 0000000000002000
> May 4 03:16:34 ace %l4-7: 0000000000000001 000002a7515aa000
> 000002a1003a5828 00000000000020
> 00
> May 4 03:16:35 ace genunix: [ID 723222 kern.notice] 000002a1003a5650
> genunix:lookuppnvp+2cc (
> 2a1003a5a10, 0, 1045ec30, 1045aac0, 0, 0)
> May 4 03:16:35 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000000010164da8 000002a1003a5a1
> 0 0000000000000000 0000030001a45e18
> May 4 03:16:35 ace %l4-7: 0000030000e47ce8 000002a1003a5ae8
> 0000030005cc20a8 00000000000000
> 00
> May 4 03:16:35 ace genunix: [ID 723222 kern.notice] 000002a1003a5850
> genunix:lookuppn+108 (30
> 001a45e18, 0, 0, 30001a45e18, 2a1003a5ae8, 0)
> May 4 03:16:35 ace genunix: [ID 179002 kern.notice] %l0-3:
> 0000000000000000 000003000265154
> 8 000002a1003a5a10 0000000000002005
> May 5 06:15:53 ace %l4-7: 00000300000558c8 0000030000e35ea8
> 0000000000000000 0000030000e35e
> d0
> May 5 06:15:53 ace genunix: [ID 723222 kern.notice] 000002a100045a60
> genunix:errorq_intr+4 (3
> 0000072008, 806, 1041b338, 1041b728, 100c0, 10084d9c)
> May 5 06:15:53 ace genunix: [ID 179002 kern.notice] %l0-3:
> 000000001040dae4 000000000000000
> 5 000000000000007b 00000000000002a0
> May 5 06:15:53 ace %l4-7: 00000000c3900200 0000000000400000
> 0000000000400000 00000000000000
> 01
> May 5 06:15:53 ace unix: [ID 100000 kern.notice]
> May 5 06:15:53 ace genunix: [ID 672855 kern.notice] syncing file
> systems...
> May 5 06:15:53 ace genunix: [ID 904073 kern.notice] done
> May 5 06:15:53 ace genunix: [ID 353387 kern.notice] dumping to
> /dev/dsk/c0t0d0s4, offset 8591
> 11424
> May 5 06:15:53 ace genunix: [ID 409368 kern.notice] ^M100% done:
> 26133 pages dumped, compress
> ion ratio 8.40,
> May 5 06:15:53 ace genunix: [ID 851671 kern.notice] dump succeeded
>

-- 
Thanks,
-Rajdeep
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:21 EDT