Bad SDLT 320?

From: Chris Cameron (Chris.Cameron@NetThruPut.com)
Date: Thu Aug 26 2004 - 12:48:53 EDT

Next message: Rachid BOUKHARI: "SUMMARY: Colour problem on PGX64/Blade100"
Previous message: John Dunn: "upgrade to Solaris 9 and Oracle 9.2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Before I go to all the trouble of trying to get Sun to replace my tape
drive, I wanted to tap some of the experience that's on this list to
see if what I'm seeing would point to a bad tape drive.

Have a V240 hooked up to a Sun SDLT 320 drive. Up until this week it was
backing up ~60 Gigs worth of data using AMANDA (been doing so for 7
months). That 60 gigs is spread across 3 servers, 1 of which 1 is local
and the rest are remote.

When AMANDA does backups, it'll consistently fail on a remote 25 gig
partition (and only this partition). The errors that AMANDA gives
(which from my understanding are just passed along dump errors) are:

  devl2 /dev/md/dsk/d8 lev 0 FAILED [out of tape]
  devl2 /dev/md/dsk/d8 lev 0 FAILED ["data write: Broken pipe"]
  devl2 /dev/md/dsk/d8 lev 0 FAILED [dump to tape failed]

After this the tape drive is unresponsive to any commands, and the
following errors show up in /var/adm/messages:

Aug 26 09:10:54 prod2 scsi: [ID 107833 kern.warning]
WARNING: /pci@1c,600000/scsi@2,1/st@5,0 (st12):
Aug 26 09:10:54 prod2 SCSI transport failed: reason 'incomplete':
retrying command
Aug 26 09:10:56 prod2 scsi: [ID 107833 kern.warning]
WARNING: /pci@1c,600000/scsi@2,1/st@5,0 (st12):
Aug 26 09:10:56 prod2 SCSI transport failed: reason 'incomplete':
retrying command
Aug 26 09:10:57 prod2 scsi: [ID 107833 kern.warning]
WARNING: /pci@1c,600000/scsi@2,1/st@5,0 (st12):
Aug 26 09:10:57 prod2 SCSI transport failed: reason 'incomplete':
giving up
Aug 26 09:26:39 prod2 scsi: [ID 365881
kern.info] /pci@1c,600000/scsi@2,1 (glm1):
Aug 26 09:26:39 prod2 Cmd (0x1b37578) dump for Target 5 Lun 0:
Aug 26 09:26:39 prod2 scsi: [ID 365881
kern.info] /pci@1c,600000/scsi@2,1 (glm1):
Aug 26 09:26:39 prod2 cdb=[ 0xa 0x0 0x0 0x80 0x0 0x0 ]
Aug 26 09:26:39 prod2 scsi: [ID 365881
kern.info] /pci@1c,600000/scsi@2,1 (glm1):
Aug 26 09:26:39 prod2 pkt_flags=0x0 pkt_statistics=0x61 pkt_state=0x7
Aug 26 09:26:39 prod2 scsi: [ID 365881
kern.info] /pci@1c,600000/scsi@2,1 (glm1):
Aug 26 09:26:39 prod2 pkt_scbp=0x0 cmd_flags=0x18e1
Aug 26 09:26:39 prod2 scsi: [ID 107833 kern.warning]
WARNING: /pci@1c,600000/scsi@2,1 (glm1):
Aug 26 09:26:39 prod2 Disconnected command timeout for Target 5.0
Aug 26 09:26:39 prod2 genunix: [ID 408822 kern.info] NOTICE: glm1: fault
detected in device; service still available
Aug 26 09:26:39 prod2 genunix: [ID 611667 kern.info] NOTICE: glm1:
Disconnected command timeout for Target 5.0
Aug 26 09:26:39 prod2 glm: [ID 160360 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6016]
Aug 26 09:26:39 prod2 scsi: [ID 107833 kern.warning]
WARNING: /pci@1c,600000/scsi@2,1/st@5,0 (st12):
Aug 26 09:26:39 prod2 SCSI transport failed: reason 'timeout': giving
up

Cycling the tape drive will have it respond to mt again.

If I try to do a dump manually from the local machine, I'll consistently
get:

</> # ufsdump -0f /dev/rmt/1n /dev/md/dsk/d6
  DUMP: Writing 32 Kilobyte records
  DUMP: Date of this level 0 dump: Thu Aug 26 09:05:39 2004
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/md/rdsk/d6 (prod2:/prod) to /dev/rmt/1n.
  DUMP: Mapping (Pass I) [regular files]
  DUMP: Mapping (Pass II) [directories]
  DUMP: Estimated 35790376 blocks (17475.77MB).
  DUMP: Dumping (Pass III) [directories]
  DUMP: Dumping (Pass IV) [regular files]
  DUMP: Write error 106032 feet into tape 1
  DUMP: NEEDS ATTENTION: Do you want to restart?: ("yes" or "no")

This happens on a number of tapes, so I doubt it's a tape error.

Is this a clear case of a bad tape drive? Tonight I'll try on a second
V240 with a different SCSI cable just to be sure. I have run a cleaning
tape through it for good measure.

Thanks,
Chris
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers

Next message: Rachid BOUKHARI: "SUMMARY: Colour problem on PGX64/Blade100"
Previous message: John Dunn: "upgrade to Solaris 9 and Oracle 9.2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:21 EDT