Problems with FC-attached tapes again!

From: Andrew Raine (andrew.raine@mrc-dunn.cam.ac.uk)
Date: Fri Nov 18 2005 - 11:58:37 EST


Dear All,

I'm battling with my FC-attached tape robot again, and hoping someone
can suggest a course of action!

System: 2-node Tru64 cluster, running 5.1pk3
FC Switch: Compaq/HP re-badged Brocade 2GB 16-port switch
Tape Robot: MSL5000, with embedded e1200 fibre-channel router

Symptoms: Every so often (usually associated with a tape error/full
tape) the whole tape subsystem appears to hang from the point of view of
the Tru64 system. After which, some of the devices seem to have
disappeared. The only thing that reliably clears the problem is a
complete cold shutdown and reboot of the whole cluster. Rebooting
individual nodes doesn't do the trick.

Currently, it is the library itself which has vanished, while the two
tape drives are still visible. In the past, one or other of the tape
drives have disappeared.

The errors that I get are things like:

beta # /usr/bin/robot show robot
ROBOT /dev/changer/mc2 is not responding: No such device.

and when the tape drives have gone AWOL, using "mt status" gives a
similar response.

I've tried doing a "scu scan edt" and "hwmgr scan scsi" but these don't
seem to get the system to re-discover the devices. Sometimes they come
back by themselves (the changer has just done so while I've been typing
this!) but sometimes they don't.

Can anyone suggest anything to diagnose the problem further or, better
still, to rectify it once it has happened?

Many thanks!

Andrew

--
Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit,
Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK
phone: +44 (0)1223 252830   fax: +44 (0)1223 252835
web: www.mrc-dunn.cam.ac.uk email: Andrew.Raine@mrc-dunn.cam.ac.uk


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:26 EDT