Uncorrectable system bus (UE) Event causes Kernel Panic and system lockup

From: Stanley Laufer (slaufer@slis.sjsu.edu)
Date: Thu Mar 03 2005 - 17:31:34 EST


One of our production Solaris 9 systems (280R, 2x UltraSPARC-III,
2GB main mem) hard hung today. After power cycling the hung system,
I recovered the following from /var/adm/messages.

This is some type of CPU error, and it worries me a bit.

Has anyone ever seen this type of CPU error before?

Note that Solaris 9 on this machine is up-to-date in terms
of public patches.

So I would suspect that a moderate to potentially serious
hardware issue is brewing.

Thanks in advance for any responses, and I'll be sure
to post a SUMMARY of relevant information.

-------------------------------------------------------
Mar 3 13:22:43 tigris unix: [ID 350512 kern.notice] panic: failed to stop
cpu0
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 940822 kern.warning]
WARNING: [AFT1] Uncorrectable system bus (UE) Event detected by CPU1
Privileged Data Access at TL=0, errID 0x000d3e43.b0537480
Mar 3 13:22:43 tigris AFSR 0x00100004<PRIV,UE>.00000071 AFAR
0x00000000.0fba5b80
Mar 3 13:22:43 tigris Fault_PC 0x117c26c Esynd 0x0071 J0100 J0202
J0304 J0406
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 339007 kern.notice]
[AFT1] errID 0x000d3e43.b0537480 Two Bits in error, likely from E$ WDU/CPU
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 134629 kern.info] [AFT2]
errID 0x000d3e43.b0537480 PA=0x00000000.0fba5b80
Mar 3 13:22:43 tigris E$tag 0x00000000.3e900000 E$state_6 Modified
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 819380 kern.info] [AFT2]
E$Data (0x00) 0xc0861fc4.007e2c14 0x00905eec.00400000 ECC 0x025 *Bad*
Esynd=0x071
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2]
E$Data (0x10) 0x0310bf7c.009662d8 0x00000000.00000000 ECC 0x0b2
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 819380 kern.info] [AFT2]
E$Data (0x20) 0xc0000000.00000000 0x00000080.00400000 ECC 0x1c2 *Bad*
Esynd=0x071
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2]
E$Data (0x30) 0x00000000.004701a0 0x00000000.00f31948 ECC 0x0b1
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2]
D$ data not available
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 628737 kern.warning]
WARNING: [AFT1] WDU Event detected by CPU1 at TL=0, errID
0x000d3e43.b0537480
Mar 3 13:22:43 tigris AFSR 0x00200020<ME,WDU>.00000071 AFAR
0x00000000.0fba5b80
Mar 3 13:22:43 tigris Fault_PC 0x117c26c Esynd 0x0071
Mar 3 13:22:43 tigris SUNW,UltraSPARC-III+: [ID 715025 kern.notice]
[AFT1] errID 0x000d3e43.b0537480 Two Bits were in error
Mar 3 13:22:43 tigris unix: [ID 321153 kern.notice] NOTICE: Scheduling
clearing of error on page 0x00000000.0fb
Mar 3 13:22:43 tigris unix: [ID 836849 kern.notice]
Mar 3 13:22:43 tigris ^Mpanic[cpu1]/thread=2a100231d40:
Mar 3 13:22:43 tigris unix: [ID 846444 kern.notice] [AFT1] errID
0x000d3e43.b0537480 UE WDU Error(s)
Mar 3 13:22:43 tigris See previous message(s) for details

Stanley E. Laufer
Network Administrator
School of Library and Information Science
San Jose State University
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:17 EDT