SUMMARY: MSL5000 not responding

From: Andrew Raine (Andrew.Raine@mrc-dunn.cam.ac.uk)
Date: Wed May 12 2004 - 08:28:48 EDT


Dear Managers,

Many thanks to Russ Schaefer, Udo de Boer, Sreenivasa Prasad, Phil Baldwin,
Colin Bull and Karl Vogel for their rapid responses (the delay in my answering
is because we had a cooling-system failure as well, and the servers had to go
down while we dealt with that).

The answer is that, in a cluster, devices visible to more than one node need
need to be dealt with carefully to avoid data loss/corruption.

Quoting Udo de Boer:

"I know out of experience that the mc device while visible on two nodes
is not always usable from two nodes in a cluster. So try using it on a
different cluster node. There is a command to fail over the device
server. I think it is drdmgr."

and Colin Bull said:

"On a cluster, the first server to connect to the tape library takes control of
it.

This might help

drdmgr mc0 # to gain control of ROBOT"

and they were right. drdmgr showed which node had grabbed the device and, although I coudn't get it to pass control over to the node I wanted, rebooting
the nodes in the right order sorted that out.

By the way, some respondents were puzzled by the robot command, expecting me to be using mcutil and the mcicap database, which is the up-to-date way of doing
things. The Media Robot Utility, and the robot command are old, I think -
they came on a floppy with the first tape robot I bought 4 years ago. I believe
that the mcicap/mcutil combination is the "proper" way to do things now...

[Original question appended below]

Regards,

Andrew

--
Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit, 
Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK
phone: +44 (0)1223 252830   fax: +44 (0)1223 252835
web: www.mrc-dunn.cam.ac.uk email: Andrew.Raine@mrc-dunn.cam.ac.uk
> Dear Managers,
> 
> I have a problem with a tape robot on my system, and I'd appreciate
> some help!
> 
> I have a Tru64 5.1 pk3 cluster (ES40 + DS20) and have recently added an
> MSL5000 tape library with twp DSLT320 drives.  This attaches via FC,
> and I believe that the zoning has been set up correctly:
> 
> The system can see all the devices:
> 
>     beta # hwmgr view d
>      HWID: Device Name          Mfg      Model            Location
>      ------------------------------------------------------------------------------
>     
>       <stuff deleted>
>     
>       226: /dev/ntape/tape4     COMPAQ   SDLT320          bus-1-targ-0-lun-2
>       227: /dev/changer/mc2              MSL5000 Series   bus-1-targ-0-lun-0
>       228: /dev/ntape/tape5     COMPAQ   SDLT320          bus-1-targ-0-lun-1
> 
> and I seem to be able to talk to the tape drives OK:
> 
>     beta # mt -f /dev/tape/tape4 status
>     
>     DEVIOGET ELEMENT        CONTENTS
>     ----------------        --------
>     category                DEV_TAPE
>     bus                     DEV_SCSI
>     interface               SCSI
>     device                  SDLT320
>     
>     <etc.>
> 
> 
> [and I see the same for tape5]
> 
> However, I can't get the Media Robot Utility (MRU) to talk to it:
> 
>     beta # setenv MRU_ROBOT /dev/changer/mc2
>     beta # robot show robot
>     ROBOT /dev/changer/mc2 is not responding: Operating system specific error.
> 
> Until the last weekend, I had an almost identical Neo Overland 2000
> system, connected by SCSI, which the MRU could see and handle fine. 
> However, the Neo's SCSI bus seems to have gone down, and I need to get
> backups working again!  Can anyone suggest what to look for?
> 
> Many thanks,
> 
> Andrew


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:58 EDT