UPDATE:Tape drive problem

From: Jim Fitzmaurice (jpfitz@fnal.gov)
Date: Thu Sep 26 2002 - 13:05:25 EDT


As per Danny Pettersen's suggestion, I tried this and got:

> file /dev/tape/tape3_d1
/dev/tape/tape3_d1: character special (19/2008)

Unsure of what I should see I ran it on the other tape drives and got the
following:

I decided to try it on my other tape drives and got this
> file /dev/tape/tape2_d1
/dev/tape/tape2_d1: character special (19/1828) SCSI #1 "DLT4000"
tape#91 (SCSI ID #3) (SCSI LUN #3) errors = 0/1 offline
> file /dev/tape/tape1_d1
/dev/tape/tape1_d1: character special (19/1792) SCSI #3 "DLT4000"
tape#93 (SCSI ID #3) (SCSI LUN #5) errors = 0/1 offline
> file /dev/tape/tape0_d1
/dev/tape/tape0_d1: character special (19/1612)

FYI....
tape2 is on Member1
tape1 is on Member2
tape0 is on Member3

Note that tape0 on Member3 is not accessible to any commands like tape3 on
Member2.

So now what do I do with this information?

Jim Fitzmaurice
jpfitz@fnal.gov

UNIX is very user friendly, It's just very particular about who it makes
friends with.

 ----- Original Message -----
> From: Danny Petterson
> To: Jim Fitzmaurice
> Sent: Thursday, September 26, 2002 10:54 AM
> Subject: SV: Tape drive problem.
>
>
> Hi!
>
> Maybe there is something wrong with the special device-file. If you run a
> #file /dev/tape/tape3_d1
> whats the output? It should say something about your tapedivice.
>
> Greetings
> Danny Petterson
> -----Oprindelig meddelelse-----
> Fra: Jim Fitzmaurice [mailto:jpfitz@fnal.gov]
> Sendt: to 26-09-2002 16:52
> Til: Tru64 -unix -managers
> Cc:
> Emne: Tape drive problem.
>
>
> Managers,
>
> I have a 3 system cluster, consisting of one GS80 (Member1) and two
> 4100's (Member2, and Member3) running Tru64 v5.1 and TruCluster v5.1
> PatchKit 5. I have one DLT4000 on each machine in the cluster and Member2
(a
> 4100) has an additional DLT8000 in a tape library which I use for backups.
> This morning I came in and my backups had completely failed, why:
>
> /dev/tape/tape3_d1: No such device or address
>
> The only unusual activity on the cluster occurred yesterday morning.
> HP/Compaq support had determined our Memory Channel Adapter was bad on
> Member3(a 4100). Backups were still running on Member2 when the Field
> Engineer arrived to replace the board. I took down Member3 and we replaced
> the board, backups continued to run normally on Member2, the cluster
> remained up. After restoring Member3, I noticed a GB Ethernet Adapter was
no
> longer working on that machine. The FE ordered a replacement and about 90
> minutes later it arrived and I brought Member3 down again, and we replaced
> the GB Ethernet Adapter. Again the cluster continued to function and
backup
> continued to run normally. Member3 came back 100% this time, and shortly
> after that backups ran to their normal conclusion on Member2.
>
> This morning however, backups failed and /dev/tape/tape3 is acting
> weird.
>
> I can run "scu> show edt" and it sees the device:
>
> 4 1 0 Sequential SCSI-2 QUANTUM DLT8000 0250 W
>
> I can run "hwmgr -view devices and it shows up there too:
>
> 493: /dev/ntape/tape3 QUANTUM DLT8000 bus-4-targ-1-lun-0
>
> And the device files exist as well:
>
> crw-rw-rw- 1 root system 19,2002 Jul 30 17:00 /dev/tape/tape3
> crw-rw-rw- 1 root system 19,2006 Jul 30 17:00 /dev/tape/tape3_d0
> crw-rw-rw- 1 root system 19,2008 Sep 26 09:28 /dev/tape/tape3_d1
> crw-rw-rw- 1 root system 19,2010 Jul 30 17:01 /dev/tape/tape3_d2
> crw-rw-rw- 1 root system 19,2012 Jul 30 17:01 /dev/tape/tape3_d3
> crw-rw-rw- 1 root system 19,2014 Jul 30 17:01 /dev/tape/tape3_d4
> crw-rw-rw- 1 root system 19,2016 Jul 30 17:01 /dev/tape/tape3_d5
> crw-rw-rw- 1 root system 19,2018 Jul 30 17:01 /dev/tape/tape3_d6
> crw-rw-rw- 1 root system 19,2020 Jul 30 17:01 /dev/tape/tape3_d7
> crw-rw-rw- 1 root system 19,2004 Jul 30 17:00 /dev/tape/tape3c
>
> However, I try to run any other command to actually access the drive
and
> it's not there:
>
> > mt -f /dev/tape/tape3 status
> /dev/tape/tape3: No such device or address
> > tar -cvf /dev/tape/tape3_d1 /etc/brutab
> tar: cannot open /dev/tape/tape3_d1: No such device or address
> > dd if=/etc/brutab of=/dev/tape/tape3_d1
> /dev/tape/tape3_d1: No such device or address
>
> All cables have been reseated, and the bus is properly terminated, I
> tried rebooting the library and the drive. Rebooting Member2 or the entire
> Cluster will not be feasible for at least a week. Nothing unusual in the
> messages log, and the last error in the binary error log was yesterday
> afternoon, and it was a disk error, wrong Target, different LUN, not this
> device.
>
> I realize that something done to one Member of a cluster can effect
the
> other members, but I wouldn't think a couple reboot's to replace bad cards
> could cause a device to behave like that. What could have happened? But
more
> importantly, does anybody know how I can fix it?
>
> Any help would be greatly appreciated.
>
> James Fitzmaurice
> D0 Online Systems Manager
> Fermi National Accelerator Laboratory
> (630) 840-4011
> jpfitz@fnal.gov
>
> UNIX is very user friendly, It's just very particular about who it makes
> friends with.
>
>
>
>



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:54 EDT