Why did my E3500 reboot?

From: Richard.Skelton@infineon.com
Date: Tue Sep 09 2003 - 07:13:47 EDT


Hi Managers,
One of my E3500's rebooted last night.
The machine is running Solaris 8 Generic_108528-21 with eCache Scrubbing set
in /etc/system
*eCache Scrubbing
set ecache_scrub_enable = 1
set ecache_scan_rate=1000
set ecache_calls_a_sec=100
*End eCache Settings

The machine ha 8 400MHz cpu's and 16GB memory:-
========================= CPUs =========================

                    Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
 3 6 0 400 8.0 US-II 10.0
 3 7 1 400 8.0 US-II 10.0
 5 10 0 400 8.0 US-II 10.0
 5 11 1 400 8.0 US-II 10.0
 7 14 0 400 8.0 US-II 10.0
 7 15 1 400 8.0 US-II 10.0
 9 18 0 400 8.0 US-II 10.0
 9 19 1 400 8.0 US-II 10.0

========================= Memory =========================

                                              Intrlv. Intrlv.
Brd Bank MB Status Condition Speed Factor With
--- ----- ---- ------- ---------- ----- ------- -------
 3 0 2048 Active OK 60ns 8-way A
 3 1 2048 Active OK 60ns 8-way A
 5 0 2048 Active OK 60ns 8-way A
 5 1 2048 Active OK 60ns 8-way A
 7 0 2048 Active OK 60ns 8-way A
 7 1 2048 Active OK 60ns 8-way A
 9 0 2048 Active OK 60ns 8-way A
 9 1 2048 Active OK 60ns 8-way A

I have both the console output and the relevant section from
/var/adm/messages below.

I am not sure if this is the old eCache problem or a real memory problem.

Cheers
Richard.

Console output:-

WARNING: [AFT1] Uncorrectable Memory Error on CPU7 Instruction access at
TL>0, e
rrID 0x000c30c4.f67891a8
    AFSR 0x00000000.00200000<UE> AFAR 0x00000001.5a336a30
    AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10132e9e4
    UDBH 0x021b<UE> UDBH.ESYND 0x1b UDBL 0x0000 UDBL.ESYND 0x00
    UDBH Syndrome 0x1b Memory Module Board 3 J3100 J3200 J3300 J3400 J3500
J3600
 J3700 J3800

panic[cpu7]/thread=30006e2fbc0: [AFT1] errID 0x000c30c4.f67891a8 UE Error(s)
    See previous message(s) for details

000002a100f4b6d0 SUNW,UltraSPARC-II:cpu_aflt_log+568 (2a100f4b78e, 1,
1014df28,
2a100f4b918, 2a100f4b7db, 1014df50)
  %l0-3: 0000000000000000 0000000000000003 000002a100f4b9e0 0000000000000010
  %l4-7: 0000000000f32400 0000000100000000 0000000000000001 0000000000000000
000002a100f4b920 SUNW,UltraSPARC-II:cpu_async_error+868 (1046a0b0,
2a100f4b9e0,
200000, 0, 4140043600200000, 2a100f4bba0)
  %l0-3: 000000001040db3c 000000000000000a 0000000000000000 000000000000021b
  %l4-7: 000000015a336a00 0000000000800000 0000000000800000 0000000000000001

/var/adm/messages :-
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 406603 kern.warning]
WARNING: [A
FT1] Uncorrectable Memory Error on CPU7 Instruction access at TL>0, errID
0x000c
30c4.f67891a8
Sep 9 04:43:54 brscs08 AFSR 0x00000000.00200000<UE> AFAR
0x00000001.5a336a3
0
Sep 9 04:43:54 brscs08 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0
x10132e9e4
Sep 9 04:43:54 brscs08 UDBH 0x021b<UE> UDBH.ESYND 0x1b UDBL 0x0000
UDBL.ESY
ND 0x00
Sep 9 04:43:54 brscs08 UDBH Syndrome 0x1b Memory Module Board 3 J3100
J3200
 J3300 J3400 J3500 J3600 J3700 J3800
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 115623 kern.info] [AFT2]
errID 0
x000c30c4.f67891a8 PA=0x00000001.5a336a30
Sep 9 04:43:54 brscs08 E$tag 0x00000000.0c402b46 E$State: Shared
E$parity 0
x06
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
(0x00): 0x81c3e008.01000000
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
(0x08): 0x06400005.973aa000
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
(0x10): 0xc4722000.81c3e008
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
(0x18): 0x01000000.8b2af004
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
(0x20): 0x80a0a000.c4720005
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
(0x28): 0x1268001a.01000000
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
E$Data
(0x30): 0x0402a001.84020005 *Bad* PSYND=0xff00
Sep 9 04:43:54 brscs08 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
(0x38): 0x933aa000.8b2a7004
Sep 9 04:43:54 brscs08 unix: [ID 836849 kern.notice]
Sep 9 04:43:54 brscs08 ^Mpanic[cpu7]/thread=30006e2fbc0:
Sep 9 04:43:54 brscs08 unix: [ID 590393 kern.notice] [AFT1] errID
0x000c30c4.f6
7891a8 UE Error(s)
Sep 9 04:43:54 brscs08 See previous message(s) for details
Sep 9 04:43:54 brscs08 unix: [ID 100000 kern.notice]
Sep 9 04:43:54 brscs08 genunix: [ID 723222 kern.notice] 000002a100f4b6d0
SUNW,U
ltraSPARC-II:cpu_aflt_log+568 (2a100f4b78e, 1, 1014df28, 2a100f4b918,
2a100f4b7d
b, 1014df50)
Sep 9 04:43:55 brscs08 genunix: [ID 179002 kern.notice] %l0-3:
00000000000000
00 0000000000000003 000002a100f4b9e0 0000000000000010
Sep 9 04:43:55 brscs08 %l4-7: 0000000000f32400 0000000100000000
0000000000000
001 0000000000000000
Sep 9 04:43:55 brscs08 genunix: [ID 723222 kern.notice] 000002a100f4b920
SUNW,U
ltraSPARC-II:cpu_async_error+868 (1046a0b0, 2a100f4b9e0, 200000, 0,
414004360020
0000, 2a100f4bba0)
Sep 9 04:43:55 brscs08 genunix: [ID 179002 kern.notice] %l0-3:
000000001040db
3c 000000000000000a 0000000000000000 000000000000021b
Sep 9 04:43:55 brscs08 %l4-7: 000000015a336a00 0000000000800000
0000000000800
000 0000000000000001
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:05 EDT