qlc loop offline and transport rejected problem

From: lh79@mail.ru
Date: Sun Jun 20 2004 - 03:50:07 EDT


Hello sunmanagers.

Let me explain our configuration first.
We have two node cluster SunCluster 3.0 on sf3800.
Two T300 WG arrays are connected to these nodes via two hubs.
Volumes on these T300 WGs are mirrored using VERITAS VM 3.2
Not so long ago new T3B WG was connected to the existing SAN
and Recommended patches were installed on SunCluster, SAN, VERITAS and Solaris.
Now we've got a very strange problem.
Everything works fine until you try to reboot a backup node for some reason.
While backup node is rebooting following messages are appearing on the
console of the primary node.

Jun 19 10:17:55 sf3800-1 qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Jun 19 10:18:12 sf3800-1 qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Jun 19 10:18:12 sf3800-1 fctl: [ID 999315 kern.warning] WARNING: fctl(0): AL_PA=0xe8 doesn't exist in LILP map
Jun 19 10:18:32 sf3800-1 scsi: [ID 243001 kern.info] /ssm@0,0/pci@1d,600000/pci@1/SUNW,qlc@4/fp@0,0 (fcp0):
Jun 19 10:18:32 sf3800-1 offlining lun=1 (trace=0), target=e8 (trace=2800004)
Jun 19 10:18:32 sf3800-1 scsi: [ID 243001 kern.info] /ssm@0,0/pci@1d,600000/pci@1/SUNW,qlc@4/fp@0,0 (fcp0):
Jun 19 10:18:32 sf3800-1 offlining lun=0 (trace=0), target=e8 (trace=2800004)
...skipping...
Jun 19 12:09:56 sf3800-1 qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Jun 19 12:09:56 sf3800-1 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to e4 failed state=Packet Transport error, reason=No
Connection
Jun 19 12:10:10 sf3800-1 qfe: [ID 517869 kern.info] SUNW,qfe0: 100 Mbps full duplex link up - internal transceiver
Jun 19 12:10:10 sf3800-1 qfe: [ID 517869 kern.info] SUNW,qfe4: 100 Mbps full duplex link up - internal transceiver
Jun 19 12:10:15 sf3800-1 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@1d,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w50020f23
000094b4,1 (ssd2):
Jun 19 12:10:15 sf3800-1 transport rejected (-2)
Jun 19 12:10:15 sf3800-1 scsi: [ID 243001 kern.info] /ssm@0,0/pci@1d,600000/pci@1/SUNW,qlc@4/fp@0,0 (fcp0):
Jun 19 12:10:15 sf3800-1 offlining lun=1 (trace=0), target=e4 (trace=2800004)
Jun 19 12:10:15 sf3800-1 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@1d,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w50020f23
000094b4,0 (ssd3):
Jun 19 12:10:15 sf3800-1 transport rejected (-2)
Jun 19 12:10:15 sf3800-1 scsi: [ID 243001 kern.info] /ssm@0,0/pci@1d,600000/pci@1/SUNW,qlc@4/fp@0,0 (fcp0):
Jun 19 12:10:15 sf3800-1 offlining lun=0 (trace=0), target=e4 (trace=2800004)
Jun 19 12:10:15 sf3800-1 vxdmp: [ID 997040 kern.notice] NOTICE: vxvm:vxdmp: disabled path 118/0x10 belonging to the dmpnode 2
40/0x20
Jun 19 12:10:15 sf3800-1 vxdmp: [ID 148046 kern.notice] NOTICE: vxvm:vxdmp: disabled dmpnode 240/0x20
Jun 19 12:10:15 sf3800-1 vxdmp: [ID 997040 kern.notice] NOTICE: vxvm:vxdmp: disabled path 118/0x18 belonging to the dmpnode 2
40/0x28
Jun 19 12:10:15 sf3800-1 vxdmp: [ID 148046 kern.notice] NOTICE: vxvm:vxdmp: disabled dmpnode 240/0x28
Jun 19 12:10:15 sf3800-1 vxio: [ID 663439 kern.warning] WARNING: vxvm:vxio: Subdisk c1t2d0s2-01 block 13593360: Uncorrectable
 write error
Jun 19 12:10:15 sf3800-1 vxio: [ID 663439 kern.warning] WARNING: vxvm:vxio: Subdisk c1t2d0s2-01 block 6939152: Uncorrectable
read error
Jun 19 12:10:15 sf3800-1 vxio: [ID 663439 kern.warning] WARNING: vxvm:vxio: Subdisk c1t2d0s2-01 block 13593344: Uncorrectable
 write error
Jun 19 12:10:15 sf3800-1 vxio: [ID 663439 kern.warning] WARNING: vxvm:vxio: Subdisk c1t2d0s2-01 block 1047392: Uncorrectable
write error
Jun 19 12:10:15 sf3800-1 vxio: [ID 663439 kern.warning] WARNING: vxvm:vxio: Subdisk c1t2d0s2-01 block 1268385: Uncorrectable
write error
Jun 19 12:10:15 sf3800-1 ufs_log: [ID 702911 kern.warning] WARNING: Error reading master
Jun 19 12:10:15 sf3800-1 ufs_log: [ID 127457 kern.warning] WARNING: ufs log for /arch/u1 changed state to Error
Jun 19 12:10:15 sf3800-1 ufs_log: [ID 616219 kern.warning] WARNING: Please umount(1M) /arch/u1 and run fsck(1M)

As a result some veritas mirrors becomes broken and some volumes may
become complitely unavalable.
That makes oracle resource group crash.

If enybody have encountered such a problem or can advise something
please you are welcome. I can explain situation in details if needed.
It would be nice to get a solution by the end of a day.
Thank you.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:28:54 EDT