From: Andrew Raine (Andrew.Raine@mrc-dunn.cam.ac.uk)
Date: Mon Nov 11 2002 - 08:33:00 EST
Dear Managers,
Thanks to all the help I received here, I fixed the backup/tape/changer
troubles that I was having. However .......
I have a 2-node cluster, running 5.1PK3, with a Neo Overland SDLT
drive/changer attached to one node (ES40, 4 procs).
Everything was fine, and my backups were running well enough for me to
feel confident to replace my HSZ80 with an HSG80. Things carried on
working until half-way Sunday morning's backup, when the device file
for my SDLT drive vanished between vdump savesets!
I was using /dev/ntape/tape2_d1, but now:
# mt -f /dev/ntape/tape2_d1 status
/dev/ntape/tape2_d1: No such device or address
Also, if I do:
# hwmgr -v d
HWID: Device Name Mfg Model Location
------------------------------------------------------------------------------
<lines deleted>
74: /dev/changer/mc1 TL820 bus-2-targ-6-lun-0
75: /dev/disk/dsk30c QUANTUM SuperDLT1 bus-2-targ-1-lun-0
^^^^^^^^^^^
i.e the system thinks that the SDLT drive is now a *disk*!
However, the robot utility and xrobot can both see the drive and
sucessfully load and unload tapes.
Can anyone explain what might have happened? Is there a way of
re-building the appropriate device files without bringing the system
down? (Actually, I don't know how to do this even if I do bring the
system down - could someone point me in the right direction?)
Further diagnostics are appended.....
Regards,
Andrew
-- Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit, Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK phone: +44 (0)1223 252830 fax: +44 (0)1223 252835 web: www.mrc-dunn.cam.ac.uk email: Andrew.Raine@mrc-dunn.cam.ac.uk Further information: The system logged: Sequence number of error: 148505507 Time of error entry: 10-Nov-2002 01:04:37 Host name: beta SCSI CAM ERROR PACKET Controller type: DISK SCSI device class: DEC SIM Bus Number: 2 Target number: 1 Lun Number: 3 Name of routine that logged the event: ss_perform_timeout Event information: timeout on disconnected request ############### Entry End ############### Event information: Active CCB at time of error ############### Entry End ############### Bus 2 only has the SDLT Drive and the changer on it. Nothing on bus 2 has (Target,Lun) = (1,3) though. Not even the two DLT drives and their TL891 changer that I removed at the time of the HSG upgrade. There also seemed to be some funny NFS stuff going on at the same time. >From /var/adm/syslog.dated/09-Nov-14:46/kern.log: Nov 10 00:40:24 beta vmunix: NFS3 RFS3_WRITE failed for server ftp-bioinf.mrc-dunn.cam.ac.uk: RPC: Timed out Nov 10 00:40:24 beta vmunix: NFS3 RFS3_WRITE failed for server ftp-bioinf.mrc-dunn.cam.ac.uk: RPC: Timed out Nov 10 00:40:24 beta vmunix: NFS3 write error 60 on host ftp-bioinf.mrc-dunn.cam.ac.uk Nov 10 00:40:24 beta vmunix: NFS3 write error 60 on host ftp-bioinf.mrc-dunn.cam.ac.uk <repeated about 2,500 times>
This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:59 EDT