V880 crashes

From: Grzegorz Bakalarski (G.Bakalarski@icm.edu.pl)
Date: Sat Nov 06 2004 - 09:36:22 EST


Dear Guru's

Our production server: SUN Fire V880, 6x900MHz, 12GB, Solaris 9,
crashed twice during last 48 hours. First time it did panic and
successfully rebooted itself. Second time it did panic and died
(I had to power off/on machine).

Could anyone tell, what is the problem? Is it hardware or software?
May recommended patches help somhow?

On other hand I started machine in diagnostic mode and there was
no errors. Also prtdiag does not show any failures.

The machine is 2 years old so still is under hardware warranty ...
What is strange the events occurred when load was low (less than 1;
during daytime the load can be upto 40).

Thanks for any help

Grzegorz

>info from /var/adm/messages
====================================== 1 ===============================
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 360866 kern.warning] WARNING: [AFT1] EDU:ST Event detected by CPU0 at TL=0, errID 0x0030e2ee.7ce2a5e4
Nov 4 21:00:01 v880_sol9 AFSR 0x00000008<EDU>.00000152 AFAR 0x000000a0.3db88550
Nov 4 21:00:01 v880_sol9 Fault_PC 0x1177184 Esynd 0x0152
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 360866 kern.warning] WARNING: [AFT1] EDU:ST Event detected by CPU0 at TL=0, errID 0x0030e2ee.7ce2a5e4
Nov 4 21:00:01 v880_sol9 AFSR 0x00000008<EDU>.00000152 AFAR 0x000000a0.3db88550
Nov 4 21:00:01 v880_sol9 Fault_PC 0x1177184 Esynd 0x0152
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 606810 kern.notice] [AFT1] errID 0x0030e2ee.7ce2a5e4 More than four Bits were in error
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 465517 kern.info] [AFT2] errID 0x0030e2ee.7ce2a5e4 PA=0x000000a0.3db88540
Nov 4 21:00:01 v880_sol9 E$tag 0x00000280.f6020000 E$state_5 Modified
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000300.08f1f440 0x00000000.00000000 ECC 0x0a3
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 819380 kern.info] [AFT2] E$Data (0x10) 0x07000000.00000000 0xf0ff0fff.ffffffff ECC 0x100 *Bad* Esynd=0x152
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x80000002.00000000 0x00000000.00000000 ECC 0x099
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 819380 kern.info] [AFT2] E$Data (0x30) 0xffffffff.00000000 0x01002000.00000000 ECC 0x1d5 *Bad* Esynd=0x071
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data not available
Nov 4 21:00:01 v880_sol9 unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x000000a0.3db88000
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 209006 kern.warning] WARNING: [AFT1] DUE Event detected by CPU0 at TL=0, errID 0x0030e2ee.7ce1b288
Nov 4 21:00:01 v880_sol9 AFSR 0x00500000<DUE,PRIV>.00000152 AFAR 0x000000a0.3db88550
Nov 4 21:00:01 v880_sol9 Fault_PC 0x1035ec4 Esynd 0x0152 Slot A: J7900 J7901 J8001 J8000
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 673850 kern.notice] [AFT1] errID 0x0030e2ee.7ce1b288 More than four Bits were in error
Nov 4 21:00:01 v880_sol9 SUNW,UltraSPARC-III+: [ID 630565 kern.warning] WARNING: [AFT1] Uncorrectable system bus (UE) Event detected by CPU3 Privileged Data Access at TL=0, errID 0x0030e2ee.7ce38f18
Nov 4 21:00:01 v880_sol9 AFSR 0x00100004<PRIV,UE>.000000b6 AFAR 0x000000a0.2e5ea340
Nov 4 21:00:01 v880_sol9 Fault_PC 0x1090154 Esynd 0x00b6 Slot A: J7900 J7901 J8001 J8000
Nov 4 21:00:02 v880_sol9 SUNW,UltraSPARC-III+: [ID 196182 kern.notice] [AFT1] errID 0x0030e2ee.7ce38f18 Three Bits were in error
Nov 4 21:00:02 v880_sol9 SUNW,UltraSPARC-III+: [ID 828748 kern.info] [AFT2] errID 0x0030e2ee.7ce38f18 PA=0x000000a0.2e5ea340
Nov 4 21:00:02 v880_sol9 E$tag 0x00000280.b9010000 E$state_5 Exclusive
Nov 4 21:00:02 v880_sol9 SUNW,UltraSPARC-III+: [ID 819380 kern.info] [AFT2] E$Data (0x00) 0x00000318.7da154b0 0x0c007000.00000000 ECC 0x07a *Bad* Esynd=0x0b6
Nov 4 21:00:02 v880_sol9 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000700.10c6f5d0 0x03007300.3c50a108 ECC 0x074
Nov 4 21:00:02 v880_sol9 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000300.09542328 0x03007002.baddcafe ECC 0x1b9
Nov 4 21:00:02 v880_sol9 SUNW,UltraSPARC-III+: [ID 819380 kern.info] [AFT2] E$Data (0x30) 0x00000000.00000000 0x03006332.15542280 ECC 0x0a9 *Bad* Esynd=0x149
Nov 4 21:00:02 v880_sol9 SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available
Nov 4 21:00:02 v880_sol9 unix: [ID 836849 kern.notice]
Nov 4 21:00:02 v880_sol9 ^Mpanic[cpu3]/thread=30003671520:
Nov 4 21:00:02 v880_sol9 unix: [ID 640582 kern.notice] [AFT1] errID 0x0030e2ee.7ce38f18 UE Error(s)
Nov 4 21:00:02 v880_sol9 See previous message(s) for details
Nov 4 21:00:02 v880_sol9 unix: [ID 100000 kern.notice]
Nov 4 21:00:02 v880_sol9 genunix: [ID 723222 kern.notice] 000002a1004969c0 SUNW,UltraSPARC-III+:cpu_aflt_log+5c0 (2a100496acb, 1, 2a100496cd8, 10, 117d180, 117d1a8)
Nov 4 21:00:02 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 0000000001222d04 0000000000000010 0000000000000003 000002a100496cd8
Nov 4 21:00:02 v880_sol9 %l4-7: 000000a02e5ea340 0000000000000000 000002a100496c08 000002a100496a7e
Nov 4 21:00:02 v880_sol9 genunix: [ID 723222 kern.notice] 000002a100496c10 SUNW,UltraSPARC-III+:cpu_deferred_error+4d4 (0, 1, 40100004032000b6, 40100004, a0, 6bc)
Nov 4 21:00:02 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 000002a100496cd8 0000000400000000 40100004032000b6 000003000367d928
Nov 4 21:00:02 v880_sol9 %l4-7: 0000000000000001 000002a100497220 0000030000010300 0000000080000000
Nov 4 21:00:02 v880_sol9 genunix: [ID 723222 kern.notice] 000002a100497170 unix:ktl0+48 (30002f1b298, 0, 20, 0, 7092c300, 0)
Nov 4 21:00:03 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000005 0000000000001400 0000000080001604 0000000001171800
Nov 4 21:00:03 v880_sol9 %l4-7: 0000000001446800 0000000001410478 0000000000000000 000002a100497220
Nov 4 21:00:03 v880_sol9 genunix: [ID 723222 kern.notice] 000002a1004972c0 genunix:dnlc_purge_vfsp+8c (30002f1b298, 2a100497370, 144f400, 1495000, 2a100497440, 2a100497446)
Nov 4 21:00:03 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 00000300093ae008 0000000000000000 00000301f50e6540 0000000000000000
Nov 4 21:00:03 v880_sol9 %l4-7: 0000000000000000 0000030002f1b288 0000030008b665b0 0000000001443ee0
Nov 4 21:00:03 v880_sol9 genunix: [ID 723222 kern.notice] 000002a1004973b0 genunix:dounmount+c (30008b665b0, 0, 300003a5f28, 0, 30003671520, 0)
Nov 4 21:00:03 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 000003003cc565c0 000003000396d5e0 0000000000000000 0000000000000000
Nov 4 21:00:03 v880_sol9 %l4-7: 000003000b3be100 0000030009387ab0 000003000b3be182 0000030009387b08
Nov 4 21:00:03 v880_sol9 genunix: [ID 723222 kern.notice] 000002a100497460 namefs:nm_umountall+a8 (781ad4a0, 300003a5f28, 20, 2a1004975bc, 30003671520, 4)
Nov 4 21:00:03 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 00000300045a3308 0000030008b665b0 0000000000000000 00000300038aa8c0
Nov 4 21:00:03 v880_sol9 %l4-7: 0000000000000000 00000000781ad488 0000000000000088 00000000781ad540
Nov 4 21:00:04 v880_sol9 genunix: [ID 723222 kern.notice] 000002a100497510 namefs:nm_unmountall+10 (300038aa8c0, 300003a5f28, 20, 7bf, 0, 0)
Nov 4 21:00:04 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 00000300038aa8c0 00000300003a5f28 0000000000000001 0000000001499508
Nov 4 21:00:04 v880_sol9 %l4-7: 0000000000000001 0000000000000000 0000030003963e38 000002a100497ba0
Nov 4 21:00:04 v880_sol9 genunix: [ID 723222 kern.notice] 000002a1004975c0 unix:stubs_common_code+70 (300038aa8c0, 300003a5f28, 20, 7bf, 0, 0)
Nov 4 21:00:04 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000000000000000 0000030009386910 0000000000000000
Nov 4 21:00:04 v880_sol9 %l4-7: 00000000000000b0 0000000001410a10 0000030003963ce0 0000030009387b38
Nov 4 21:00:04 v880_sol9 genunix: [ID 723222 kern.notice] 000002a100497670 fifofs:fifo_close+2d8 (30003963dd0, 300038aa8ae, 1, 0, 300003a5f28, 3000367151c)
Nov 4 21:00:04 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 00000300038aa8a0 00000300038aa8c0 0000000000000003 0000000000000000
Nov 4 21:00:04 v880_sol9 %l4-7: 00000300038aa9c0 000003000366f188 00000300038aa9c0 0000000000000000
Nov 4 21:00:04 v880_sol9 genunix: [ID 723222 kern.notice] 000002a100497720 genunix:closef+54 (3000932d378, 0, 1, 0, 100c6ac, 0)
Nov 4 21:00:04 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 0000000001340550 0000000000000001 00000300038aa9c0 000000000000000f
Nov 4 21:00:04 v880_sol9 %l4-7: 0000000001495000 0000000000000000 000000000140e000 0000000000000001
Nov 4 21:00:05 v880_sol9 genunix: [ID 723222 kern.notice] 000002a1004977d0 genunix:closeall+30 (300036d1d10, 30003671520, 20, 0, 7092c300, 0)
Nov 4 21:00:05 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 00000300092faea8 0000000000000004 0000030000010680 0000000000000000
Nov 4 21:00:05 v880_sol9 %l4-7: 0000030000010558 0000000001410478 0000030003671520 000000000000fffd
Nov 4 21:00:05 v880_sol9 genunix: [ID 723222 kern.notice] 000002a100497880 genunix:proc_exit+310 (3023596f798, 149c280, 30003671520, 300003a5f28, 0, 0)
Nov 4 21:00:05 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 000003000b3d84a0 00000300036d1440 000003000366f188 0000000000000002
Nov 4 21:00:05 v880_sol9 %l4-7: 000000000000000f 0000000000000002 000000000000000f 0000000000000000
Nov 4 21:00:05 v880_sol9 genunix: [ID 723222 kern.notice] 000002a100497930 genunix:exit+8 (2, f, 300036d1554, 0, 30003671520, 0)
Nov 4 21:00:05 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 000000000000000f 0000000000000002 0000000000004000 00000300036d1440
Nov 4 21:00:05 v880_sol9 %l4-7: 0000000000000000 000000000000000f 0000000000000070 0000000000000000
Nov 4 21:00:05 v880_sol9 genunix: [ID 723222 kern.notice] 000002a1004979e0 genunix:post_syscall+3e0 (2a100497ba0, 3, 0, 1, 30003671520, 4)
Nov 4 21:00:05 v880_sol9 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000004 00000300036d1440 000003000366f188 0000000000000000
Nov 4 21:00:05 v880_sol9 %l4-7: 0000000000000000 0000000000000000 0000000000000004 00000000ffbffdf8
Nov 4 21:00:06 v880_sol9 unix: [ID 100000 kern.notice]
Nov 4 21:00:06 v880_sol9 genunix: [ID 672855 kern.notice] syncing file systems...
Nov 4 21:00:06 v880_sol9 unix: [ID 836849 kern.notice]
Nov 4 21:00:06 v880_sol9 ^Mpanic[cpu3]/thread=30003671520:
Nov 4 21:00:06 v880_sol9 unix: [ID 340138 kern.notice] BAD TRAP: type=31 rp=1437f90 addr=a0 mmu_fsr=0 occurred in module "genunix" due to a NULL pointer dereference
Nov 4 21:00:06 v880_sol9 unix: [ID 100000 kern.notice]
Nov 4 21:00:06 v880_sol9 genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1t0d0s1, offset 644022272, content: kernel
Nov 4 21:01:30 v880_sol9 genunix: [ID 409368 kern.notice] ^M100% done: 160398 pages dumped, compression ratio 2.45,
Nov 4 21:01:31 v880_sol9 genunix: [ID 851671 kern.notice] dump succeeded
Nov 4 21:02:16 v880_sol9 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.9 Version Generic_117171-02 64-bit

================================== 2 =======================================
Nov 6 12:53:19 v880_sol9 SUNW,UltraSPARC-III+: [ID 621593 kern.warning] WARNING: [AFT1] DUE Event detected by CPU0 at TL=0, errID 0x00008286.8959bc20
Nov 6 12:53:19 v880_sol9 AFSR 0x00500000<DUE,PRIV>.000000e2 AFAR 0x000000a0.6c7ec0c0
Nov 6 12:53:19 v880_sol9 Fault_PC 0x117bb00 Esynd 0x00e2 Slot A: J8100 J8101 J8201 J8200
Nov 6 12:53:19 v880_sol9 SUNW,UltraSPARC-III+: [ID 300719 kern.notice] [AFT1] errID 0x00008286.8959bc20 Two Bits were in error
Nov 6 12:53:19 v880_sol9 SUNW,UltraSPARC-III+: [ID 978170 kern.info] [AFT2] errID 0x00008286.8959bc20 PA=0x000000a0.6c7ec0c0
Nov 6 12:53:19 v880_sol9 E$tag 0x00000281.b1000001 E$state_3 Invalid
Nov 6 12:53:19 v880_sol9 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000000.00000000 0x00714fb0.00000000 ECC 0x123
Nov 6 12:53:19 v880_sol9 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000000.00000000 0x00000000.00000000 ECC 0x000
Nov 6 12:53:19 v880_sol9 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00714fb0.00000000 0x00000039.00000000 ECC 0x185
Nov 6 12:53:19 v880_sol9 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00717188.00718018 0xff2fa7e8.00000000 ECC 0x032
Nov 6 13:42:53 v880_sol9 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.9 Version Generic_117171-02 64-bit

========================== THE END ================================================
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:40 EDT