v880 panic: WARNING: pcisch2 (pci@9,700000): PCI fault

From: Grzegorz Bakalarski (G.Bakalarski@icm.edu.pl)
Date: Tue Apr 11 2006 - 10:37:14 EDT


Hello,

My V880 12GB RAM, 6x900MHz UltraIII Sparc server Solaris 9 (patched
on January 2006) today made panic and rebooted itself.

Here are messages:

Console:

WARNING: pcisch2 (pci@9,700000): PCI fault log start:
PCI iommu error
pcisch2: Error 1 on IOMMU TLB entry 4:
        Context=0 not Writable not Streamable
        PCI Page Size=8k Address in page c2f88000
Memory: Valid not Cacheable Page Frame=0
PCI error ocurred on device #0
pcisch2 (pci@9,700000): PBM AFSR=0x0.00000000 dwordmask=0 bytemask=0
pcisch2 (pci@9,700000): PCI primary error (0):
pcisch2 (pci@9,700000): PCI secondary error (0):
pcisch2 (pci@9,700000): PBM AFAR 0.00000000:WARNING: pcisch2: PCI config space CSR=0x2a80<signaled-target-abort,received-master-abort>
pcisch2 (pci@9,700000): PCI fault log end.
panic[cpu2]/thread=2a100003d40: pcisch-2: PCI bus 1 error(s)!
syncing file systems...
panic[cpu2]/thread=2a100003d40: panic sync timeout
dumping to /dev/dsk/c1t0d0s1, offset 644022272, content: kernel
  1% done
.
.
.
100% done
00% done: 141855 pages dumped, compression ratio 2.38, dump succeeded
rebooting...
Resetting ...

/var/adm/messages:

Apr 11 16:00:02 goofy pcisch: [ID 462479 kern.warning] WARNING: pcisch2 (pci@9,700000): PCI fault log start:
Apr 11 16:00:02 goofy pcisch: [ID 309153 kern.notice] PCI iommu error
Apr 11 16:00:02 goofy pcisch: [ID 866426 kern.notice] pcisch2: Error 1 on IOMMU TLB entry 4:
Apr 11 16:00:02 goofy Context=0 not Writable not Streamable
Apr 11 16:00:02 goofy PCI Page Size=8k Address in page c2f88000
Apr 11 16:00:02 goofy pcisch: [ID 219581 kern.notice] Memory: Valid not Cacheable Page Frame=0
Apr 11 16:00:02 goofy pcisch: [ID 630226 kern.notice] PCI error ocurred on device #0
Apr 11 16:00:02 goofy pcisch: [ID 684763 kern.notice] pcisch2 (pci@9,700000): PBM AFSR=0x0.00000000
Apr 11 16:00:02 goofy pcisch: [ID 120591 kern.notice] dwordmask=0 bytemask=0
Apr 11 16:00:02 goofy pcisch: [ID 829486 kern.notice] pcisch2 (pci@9,700000): PCI primary error (0):
Apr 11 16:00:02 goofy pcisch: [ID 227296 kern.notice] pcisch2 (pci@9,700000): PCI secondary error (0):
Apr 11 16:00:02 goofy pcisch: [ID 748186 kern.notice] pcisch2 (pci@9,700000): PBM AFAR 0.00000000:
Apr 11 16:00:02 goofy pcisch: [ID 127741 kern.warning] WARNING: pcisch2: PCI config space CSR=0x2a80<signaled-target-abort,received-master-abort>
Apr 11 16:00:02 goofy pcisch: [ID 656289 kern.notice] pcisch2 (pci@9,700000): PCI fault log end.
Apr 11 16:00:02 goofy unix: [ID 836849 kern.notice]
Apr 11 16:00:02 goofy ^Mpanic[cpu2]/thread=2a100003d40:
Apr 11 16:00:02 goofy unix: [ID 578303 kern.notice] pcisch-2: PCI bus 1 error(s)!
Apr 11 16:00:02 goofy unix: [ID 100000 kern.notice]
Apr 11 16:00:02 goofy last message repeated 1 time
Apr 11 16:00:02 goofy genunix: [ID 672855 kern.notice] syncing file systems...
Apr 11 16:00:32 goofy unix: [ID 836849 kern.notice]
Apr 11 16:00:32 goofy ^Mpanic[cpu2]/thread=2a100003d40:
Apr 11 16:00:32 goofy unix: [ID 715357 kern.notice] panic sync timeout
Apr 11 16:00:32 goofy unix: [ID 100000 kern.notice]
Apr 11 16:00:32 goofy genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1t0d0s1, offset 644022272, content: kernel
Apr 11 16:01:51 goofy genunix: [ID 409368 kern.notice] ^M100% done: 141855 pages dumped, compression ratio 2.38,
Apr 11 16:01:51 goofy genunix: [ID 851671 kern.notice] dump succeeded
Apr 11 16:02:33 goofy genunix: [ID 540533 kern.notice] ^MSunOS Release 5.9 Version Generic_118558-21 64-bit
Apr 11 16:02:33 goofy genunix: [ID 943905 kern.notice] Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
Apr 11 16:02:33 goofy Use is subject to license terms.
Apr 11 16:02:33 goofy genunix: [ID 678236 kern.info] Ethernet address = 0:3:ba:a:e6:fb
Apr 11 16:02:33 goofy unix: [ID 389951 kern.info] mem = 12582912K (0x300000000)
Apr 11 16:02:33 goofy unix: [ID 930857 kern.info] avail mem = 12348252160

Then normal startup...

On this pci@9,700000 machine has several device SCSI HBA, network card, rsc card ...

Here is `grep "pci@9,700000" messages` output:

Apr 11 16:00:02 goofy pcisch: [ID 462479 kern.warning] WARNING: pcisch2 (pci@9,700000): PCI fault log start:
Apr 11 16:00:02 goofy pcisch: [ID 684763 kern.notice] pcisch2 (pci@9,700000): PBM AFSR=0x0.00000000
Apr 11 16:00:02 goofy pcisch: [ID 829486 kern.notice] pcisch2 (pci@9,700000): PCI primary error (0):
Apr 11 16:00:02 goofy pcisch: [ID 227296 kern.notice] pcisch2 (pci@9,700000): PCI secondary error (0):
Apr 11 16:00:02 goofy pcisch: [ID 748186 kern.notice] pcisch2 (pci@9,700000): PBM AFAR 0.00000000:
Apr 11 16:00:02 goofy pcisch: [ID 656289 kern.notice] pcisch2 (pci@9,700000): PCI fault log end.
Apr 11 16:02:33 goofy genunix: [ID 936769 kern.info] pcisch2 is /pci@9,700000
Apr 11 16:02:41 goofy genunix: [ID 936769 kern.info] ebus0 is /pci@9,700000/ebus@1
Apr 11 16:02:43 goofy genunix: [ID 936769 kern.info] todds12870 is /pci@9,700000/ebus@1/rtc@1,300070
Apr 11 16:02:43 goofy genunix: [ID 936769 kern.info] su0 is /pci@9,700000/ebus@1/rsc-control@1,3062f8
Apr 11 16:02:43 goofy genunix: [ID 936769 kern.info] su1 is /pci@9,700000/ebus@1/rsc-console@1,3083f8
Apr 11 16:02:43 goofy genunix: [ID 936769 kern.info] ohci0 is /pci@9,700000/usb@1,3
Apr 11 16:02:44 goofy genunix: [ID 936769 kern.info] se0 is /pci@9,700000/ebus@1/serial@1,400000
Apr 11 16:02:45 goofy genunix: [ID 936769 kern.info] eri0 is /pci@9,700000/network@1,1
Apr 11 16:02:45 goofy genunix: [ID 936769 kern.info] rf0 is /pci@9,700000/ethernet@2
Apr 11 16:03:26 goofy scsi: [ID 365881 kern.info] /pci@9,700000/scsi@3 (glm3):
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] glm3 is /pci@9,700000/scsi@3
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd45 is /pci@9,700000/scsi@3/sd@0,0
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd81 is /pci@9,700000/scsi@3/sd@0,1
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd82 is /pci@9,700000/scsi@3/sd@0,2
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd83 is /pci@9,700000/scsi@3/sd@0,3
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd84 is /pci@9,700000/scsi@3/sd@0,4
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd85 is /pci@9,700000/scsi@3/sd@0,5
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd86 is /pci@9,700000/scsi@3/sd@0,6
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd87 is /pci@9,700000/scsi@3/sd@0,7
Apr 11 16:03:26 goofy genunix: [ID 936769 kern.info] sd46 is /pci@9,700000/scsi@3/sd@1,0
Apr 11 16:06:25 goofy genunix: [ID 936769 kern.info] se0 is /pci@9,700000/ebus@1/serial@1,400000

Machine from time to time had memory ECC errors (very rare persistent) single memory slot.

Machine is not on maintenace (it is 3+ years old).
Any suggestions?

TIA,

GB
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:39:34 EDT