error: cannot stop cpu1

From: Richard Butler (rbutler@ibc.cnr.it)
Date: Wed Jan 15 2003 - 07:36:56 EST


Hi all,

    My (fairly) new Sunfire 280R Solaris 8 is crashing at least once a
day with the /var/adm/messages errors below - not always the same except
for "cannot stop cpu1". Although it looks to me like a hardware problem
(cpu or RAM or something else?), I have some doubts because this only
started after I had installed several continuously running applications.

    Should I be:
       1) Calling my local Sun service now - definitely hardware
       2) Stopping each application to see which is causing problems
       3) Trying to understand crash dump files for more info.

  I appreciate your advice and will summarize.

        Richard

typical /var/adm/messages:

Jan 15 11:23:33 ed unix: [ID 350512 kern.notice] panic: failed to stop cpu1
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 862641 kern.warning]
WARNING: [AFT1] Uncorrectable system bus (UE) Event detected by CPU0
Privileged Data Access at TL=0, errID 0x0000275a.fd4ac4d0
Jan 15 11:23:33 ed AFSR 0x00100004<PRIV,UE>.00000071 AFAR
0x00000000.f4679eb0
Jan 15 11:23:33 ed Fault_PC 0x10032084 Esynd 0x0071 J0100 J0202
J0304 J0406
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 364402 kern.notice] [AFT1]
errID 0x0000275a.fd4ac4d0 Two Bits in error, likely from E$ WDU/CPU
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 180299 kern.info] [AFT2]
errID 0x0000275a.fd4ac4d0 PA=0x00000000.f4679e80
Jan 15 11:23:33 ed E$tag 0x00000003.d1024124 E$state_2 Modified
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2]
E$Data (0x00) 0x63616c6c.6f75745f 0x7461736b.71000000 ECC 0x1fc
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2]
E$Data (0x10) 0x00000000.00000000 0x00000000.00000000 ECC 0x000
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2]
E$Data (0x20) 0x00000000.00000000 0x00000000.00000008 ECC 0x097
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 819380 kern.info] [AFT2]
E$Data (0x30) 0xc0010000.00010000 0x00000001.00000002 ECC 0x069 *Bad*
Esynd=0x071
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$
data not available
Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 291068 kern.warning]
WARNING: [AFT1] EDU Event detected by CPU0 at TL=0, errID
0x0000275a.fd4ac4d0
Jan 15 11:23:33 ed AFSR 0x00000028<WDU,EDU>.00000071 AFAR
0x00000000.f4679eb0 AMBIGUOUS
Jan 15 11:23:33 ed Fault_PC 0x10032084 Esynd 0x0071 AMBIGUOUS
Jan 15 11:23:34 ed SUNW,UltraSPARC-III+: [ID 856704 kern.notice] [AFT1]
errID 0x0000275a.fd4ac4d0 Two Bits were in error
Jan 15 11:23:34 ed unix: [ID 321153 kern.notice] NOTICE: Scheduling
clearing of error on page 0x00000000.f4678000
Jan 15 11:23:34 ed SUNW,UltraSPARC-III+: [ID 292220 kern.warning]
WARNING: [AFT1] WDU Event detected by CPU0 at TL=0, errID
0x0000275a.fd4ac4d0
Jan 15 11:23:34 ed AFSR 0x00000028<WDU,EDU>.00000071 AFAR
0x00000000.f4679eb0 AMBIGUOUS
Jan 15 11:23:34 ed Fault_PC 0x10032084 Esynd 0x0071 AMBIGUOUS
Jan 15 11:23:34 ed SUNW,UltraSPARC-III+: [ID 856704 kern.notice] [AFT1]
errID 0x0000275a.fd4ac4d0 Two Bits were in error
Jan 15 11:23:34 ed unix: [ID 321153 kern.notice] NOTICE: Scheduling
clearing of error on page 0x00000000.f4678000
Jan 15 11:23:35 ed unix: [ID 836849 kern.notice]
Jan 15 11:23:35 ed ^Mpanic[cpu0]/thread=2a100045d20:
Jan 15 11:23:35 ed unix: [ID 892114 kern.notice] [AFT1] errID
0x0000275a.fd4ac4d0 UE EDU WDU Error(s)
Jan 15 11:23:35 ed See previous message(s) for details
Jan 15 11:23:35 ed unix: [ID 100000 kern.notice]
Jan 15 11:23:35 ed genunix: [ID 723222 kern.notice] 000002a100044e90
SUNW,UltraSPARC-III+:cpu_aflt_log+560 (2a100044f4e, 1014bf08, 1014bee0,
0, 2a1000450d8, 2a100044f9b)
Jan 15 11:23:35 ed genunix: [ID 179002 kern.notice] %l0-3:
000002a100045540 000002a100045198 0000000000000003 0000000000000010
Jan 15 11:23:35 ed %l4-7: 00000300000658c8 0000030000e6dea8
0000000000000000 0000030000e6ded0
Jan 15 11:23:36 ed genunix: [ID 723222 kern.notice] 000002a1000450e0
SUNW,UltraSPARC-III+:cpu_deferred_error+4d0 (400000000, 980c00000000, 1,
4010000403200071, 2a100045620, 4010000403200071)
Jan 15 11:23:36 ed genunix: [ID 179002 kern.notice] %l0-3:
0000000000000001 000002a100045198 0000000000000000 0000000000000000
Jan 15 11:23:36 ed %l4-7: 0000000000000219 00000000f4679eb0
0000000000000000 000002a10001f910
Jan 15 11:23:36 ed genunix: [ID 723222 kern.notice] 000002a100045570
unix:prom_rtt+0 (30000e75ea0, 2a100045d20, 1041c318, 10423a80, 2, 0)
Jan 15 11:23:36 ed genunix: [ID 179002 kern.notice] %l0-3:
0000000000000005 0000000000001400 0000000000001604 000000001014185c
Jan 15 11:23:36 ed %l4-7: 0000000000000005 0000000000000004
000000000000000a 000002a100045620
Jan 15 11:23:37 ed genunix: [ID 723222 kern.notice] 000002a1000456c0
genunix:taskq_dispatch+c (30000e75ea0, 100734b4, 300001f9000, 1,
30001d05ab0, 30000e75e80)
Jan 15 11:23:37 ed genunix: [ID 179002 kern.notice] %l0-3:
0000000010042270 0000000000000000 0000000000000000 000002a1000abd20
Jan 15 11:23:37 ed %l4-7: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
Jan 15 11:23:37 ed genunix: [ID 723222 kern.notice] 000002a100045770
genunix:callout_schedule_1+a0 (300001f9000, 300001f9000, 20, 10000,
30000e75e3a, 30000e75e60)
Jan 15 11:23:37 ed genunix: [ID 179002 kern.notice] %l0-3:
00000000100734b4 0000030000e75e38 0000030000e75e30 0000030000e75e08
Jan 15 11:23:37 ed %l4-7: 0000030000e75e28 0000030000e75e60
00000300000658f0 0000000000000002
Jan 15 11:23:38 ed genunix: [ID 723222 kern.notice] 000002a100045820
genunix:callout_schedule+54 (10439394, 1, 10439310, 8, 1, 30000162e70)
Jan 15 11:23:38 ed genunix: [ID 179002 kern.notice] %l0-3:
0000000000000000 000002a1006d1ba0 000003000288dbd8 0000000000000000
Jan 15 11:23:38 ed %l4-7: 0000000000000000 00000300027a7568
0000000000000000 0000000000000000
Jan 15 11:23:38 ed genunix: [ID 723222 kern.notice] 000002a1000458d0
genunix:clock+474 (1045d000, 1041b380, 1042e000, 325ffd4a80, 0, 0)
Jan 15 11:23:38 ed genunix: [ID 179002 kern.notice] %l0-3:
0000000000000001 0000000000000001 000002a100071d20 0000000000000000
Jan 15 11:23:38 ed %l4-7: 000000001041b380 0000000000000016
000000001041bb40 000002a1006d1ba0
Jan 15 11:23:39 ed genunix: [ID 723222 kern.notice] 000002a1000459a0
genunix:cyclic_softint+a4 (1041b380, 30000065928, 1, 3, 30000162dd0,
100746cc)
Jan 15 11:23:39 ed genunix: [ID 179002 kern.notice] %l0-3:
0000030000065930 000000000041e4c4 0000000000000000 0000030000162dd0
Jan 15 11:23:39 ed %l4-7: 00000300000658c8 0000030000e6dea8
0000000000000000 0000030000e6ded0
Jan 15 11:23:39 ed genunix: [ID 723222 kern.notice] 000002a100045a60
unix:cbe_level10+8 (0, 803, 1041b380, 2a100045d20, 10060, 1000b2cc)
Jan 15 11:23:39 ed genunix: [ID 179002 kern.notice] %l0-3:
0000030000065930 0000000000010000 0000000000000000 0000030000162dd0
Jan 15 11:23:39 ed %l4-7: 00000300000658c8 0000030000e6dea8
0000000000000000 0000030000e6ded0
Jan 15 11:23:40 ed unix: [ID 100000 kern.notice]
Jan 15 11:23:40 ed genunix: [ID 672855 kern.notice] syncing file systems...
Jan 15 11:23:40 ed genunix: [ID 904073 kern.notice] done
Jan 15 11:23:41 ed genunix: [ID 353387 kern.notice] dumping to
/dev/dsk/c1t0d0s1, offset 859701248
Jan 15 11:24:00 ed genunix: [ID 409368 kern.notice] ^M100% done: 40395
pages dumped, compression ratio 4.30,
Jan 15 11:24:00 ed genunix: [ID 851671 kern.notice] dump succeeded

followed by typical reboot sequence.

====================================================================
Richard Butler
Cell Biology Institute, C.N.R. tel: +39-06-90091-265
viale E.Ramarini, 32 fax: +39-06-90091-260
Monterotondo Scalo (Roma)
I-00016 Italy email:rbutler@ibc.cnr.it
====================================================================
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:25:36 EDT