Q: T3 Disk array->Qlogic (single)controller, Connectivity/Loop Down/Up events.

From: Tim Chipman (chipman@ecopiabio.com)
Date: Sun Sep 19 2004 - 01:10:53 EDT


Hi all,

An odd problem cropped up yesterday, and hit again today ; I was able to
find one seeming-maybe related hit in the list archives and a few
misc.obseleted patches in sunsolve. I'm hoping someone? might have seen
this kind of thing before and have comments to help me out. As always,
any help (even if peripheral in nature, ha ha) is appreciated.

[sorry about that joke, it is 1am and 2nd night in a row thus, isn't my
favourite fun evening]

Context: e450 running Solaris8, patched ~120 days ago (time of last
reboot) with standard recommended patch cluster. Machine runs oracle 8i.
Has a T3 disk array connected (single FCAL connection Qlogic HBA
connected to the t3 -- no partner pair or SAN config, just "direct
attached" raid5 array). We've got other disk attached also that isn't
relevant IMHO (internal SWraid array, external A1000 array on diff.scsi
HBA).

Last night, it seems the FCAL loop went down/up .. numerous times. This
resulted in the filesystem throwing IO errors, and Oracle became
less-than-thrilled. Various things were thrown to logging (locally,
/var/adm/messages ; and by the T3 to a syslog server). Some sample
Output from these logs is shown below.

The thing appeared to be stable so nothing was done to hardware [disk
was unmounted / fscked and re-mounted, thrashed gently and ran
smoothly], our DBA rebuilt oracle control files and got the database
functional, and then again tonight, we have the same basic issue
surfacing. This T3 has been in production for ~2 years so why,
suddenly, we have this joy, it isn't fully clear. Connection into the
T3 admin interface suggests all is well inside the unit itself. I'm
suspecting possibly the Qlogic HBA is fishy, or maybe there is some ~2
year maintenance programme I'm not aware of for T3 arrays (this HW is
not under sun support.). Tonight, so far, we've cold-rebooted all the
gear, all is back up and apparently happy for now, and we're doing cold
backups of the data while we have the easy opportunity [ie, oracle
offline]. Maybe it will run smoothly after this, who can say.

Any thoughts .. are certainly appreciated.

Thanks,

Tim Chipman

===log notes====

syslog message traces from t3:

Sep 18 21:10:13 [192.168.1.188.2.2] ISR1[1]: W: u1ctr ISP2200[2]
Received LOOP DOWN async event
Sep 18 21:10:13 [192.168.1.188.2.2] last message repeated 1 time
Sep 18 21:10:14 [192.168.1.188.2.2] ISR1[1]: N: u1ctr ISP2200[2]
Received LIP(f7,f7) async event
Sep 18 21:10:14 [192.168.1.188.2.2] last message repeated 1 time
Sep 18 21:10:14 [192.168.1.188.2.2] ISR1[1]: N: u1ctr ISP2200[2]
Received LOOP UP async event

/var/adm/message traces:

First hit (yesterday):
Sep 17 19:58:13 sapio qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0):
Loop OFFLINE
Sep 17 19:59:43 sapio fp: [ID 517869 kern.warning] WARNING: fp(0):
OFFLINE timeout
Sep 17 19:59:43 sapio scsi: [ID 243001 kern.info]
/pci@4,4000/SUNW,qlc@4/fp@0,0 (fcp0):
Sep 17 19:59:43 sapio offlining lun=0 target=e8
Sep 17 19:59:43 sapio scsi: [ID 107833 kern.warning] WARNING:
/pci@4,4000/SUNW,qlc@4/fp@0,0/ssd@w50020f230000690b,0 (ssd0):
Sep 17 19:59:43 sapio transport rejected (-2)
Sep 17 19:59:43 sapio ufs_log: [ID 702911 kern.warning] WARNING: Error
reading master
Sep 17 19:59:43 sapio ufs_log: [ID 127457 kern.warning] WARNING: ufs log
for /data/datafiles changed state to Error
Sep 17 19:59:43 sapio ufs_log: [ID 616219 kern.warning] WARNING: Please
umount(1M) /data/datafiles and run fsck(1M)
Sep 17 20:00:19 sapio qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0):
Loop ONLINE
Sep 17 22:10:56 sapio qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0):
Loop OFFLINE
Sep 17 22:10:56 sapio qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0):
Loop ONLINE
Sep 17 22:11:25 sapio qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0):
Loop OFFLINE
Sep 17 22:11:25 sapio qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0):
Loop ONLINE

Second hit (today)
Sep 18 21:06:19 sapio scsi: [ID 243001 kern.warning] WARNING:
/pci@4,4000/SUNW,qlc@4/fp@0,0/ssd@w50020f230000690b,0 (ssd0):
Sep 18 21:06:19 sapio SCSI transport failed: reason 'timeout':
retrying command
Sep 18 21:10:20 sapio scsi: [ID 243001 kern.warning] WARNING:
/pci@4,4000/SUNW,qlc@4/fp@0,0/ssd@w50020f230000690b,0 (ssd0):
Sep 18 21:10:20 sapio SCSI transport failed: reason 'timeout':
retrying command

Relevant Drives/Modules/Packages for Qlogic HW:

sapio# modinfo | grep -i qlc
 45 102ca9ea 26bc0 156 1 qlc (Qlogic FCA Driver v0.40.5)
 
 

sapio# sapio# pkginfo | grep -i qlog
system SUNWqlc Qlogic ISP 2200/2202 Fibre Channel Device Driver
system SUNWqlcx Qlogic ISP 2200/2202 Fibre Channel Device
Driver (64 bit)

sapio# pkginfo -l SUNWqlc
   PKGINST: SUNWqlc
      NAME: Qlogic ISP 2200/2202 Fibre Channel Device Driver
  CATEGORY: system
      ARCH: sparc
   VERSION: 11.8.0,REV=2000.04.01.16.21
   BASEDIR: /
    VENDOR: Sun Microsystems, Inc.
      DESC: Qlogic ISP 2200/2202 Fibre Channel Device Driver
    PSTAMP: on28-patch20010313101849
  INSTDATE: May 01 2002 17:21
   HOTLINE: Please contact your local service provider
    STATUS: completely installed
     FILES: 4 installed pathnames
                   2 shared pathnames
                   2 directories
                   1 executables
                 371 blocks used (approx)

sapio# pkginfo -l SUNWqlcx
   PKGINST: SUNWqlcx
      NAME: Qlogic ISP 2200/2202 Fibre Channel Device Driver (64 bit)
  CATEGORY: system
      ARCH: sparc
   VERSION: 11.8.0,REV=2000.04.01.16.21
   BASEDIR: /
    VENDOR: Sun Microsystems, Inc.
      DESC: Qlogic ISP 2200/2202 Fibre Channel Device Driver (64 bit)
    PSTAMP: on28-patch20010313101858
  INSTDATE: May 01 2002 17:21
   HOTLINE: Please contact your local service provider
    STATUS: completely installed
     FILES: 4 installed pathnames
                   3 shared pathnames
                   3 directories
                   1 executables
                 421 blocks used (approx)
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:28 EDT