Solaris 8 / Sun 420 Cluster / Netbackup 4.5 / EMC Clairiion / Spe ctraLogic Gator 12K

From: Adams, Jonathan K. [C] (Jonathan.K.Adams@nga.mil)
Date: Wed Jul 28 2004 - 07:23:04 EDT


Hi,

        I would appreciate any guidance that could be provided in reference
to a problem I am having with out backup setup.
I am using Veritas Netbackup 4.5, at the moment, the setup is as follows,

we have a SpectraLogic 12000 Gator with 90 Slots and 3 drives, an EMC
Claiirion with two processors (A and B), and this is hooked to a clustered
set of two Sun 420 Servers running Solaris 8.

At the momenet Sun1 is directly connected to one scsi port on the
spectralogic, and one port is connected for each processor on the EMC

There is not a problem with the Sun1 --> SpectraLogic Drive 1 setup...

BTW: The SpectraLogic has two controllers, Controller 1 (sq1 has a Drive 1
on bus 0 and Drive 3 on bus 1), the second controller (sq2 has Drive 5 on
bus 0)

The SCSI IDS Are as follows:

sq1b0 - SCSID 3 - Drive 1 SCSID 0
sq1b1 - SCSID 4 - Drive 3 SCSID 1

sq2b0 - SCSID 2 - Drive 5 SCSID 8
sq2b1 - SCSID 3 - no drive

I can succesfully use the robtest utility in /usr/openv/volmgr/bin to move
tapes are from slots to drive and vice versa

The problem is:

In net backup both Drive 3 and 5 (in Netbackup EMC-NDMP-B and EMC-NDMP-A,
respectively) are coming up as AVR, which i have been told means there is no
robotics control.

This is after (at the advice of SpectraLogic Tech Support) pulling the
Controller 2 card and replacing it with another

On the EMC Clarion, SP-A (A processor) I am getting the following error
messages ever 12-22 seconds as it scans the SCSI bus
note c0b0t0d0 is Drive 5

                -SCSI restarting Error Recovery on c0b0t0d0
                -ahc0: ahc_intr - referenced scb not valid during SELTO sch
(255)
+2 seconds -repeat of previous
                -(info) c0b0t0d0 SCB7 - timed out (operation 0x12 timeout
1000 ms)
                -(info) c0b0t0d0 While idle SEQADDR = 0x3
                -(info) c0b0t0d0: Recovery SCM Timeout
                -waiting list inconsistency
                -repeat of previous message
                -ahc0 issued Channel A Bus Reset, SCBs aborted
         

The initial problem was recurring 219 error when backing up EMCA to Drive 5
- This was followed by a series of other errors, errors mounting media,
read/write errors, etc (all on different tapes), at one point we removed
drive 5 and replaced it with a new drive, this kept the same errors,

we then swappend drive 3 and 5 and had the same problem (eliminating the
individual drive as the cause)

We then swapped the controller out (this produced the AVR situation and made
Drive 3 unusable as well, even though it is on a different bus)

keep in mind throughout this robtest works... and there arent any SCSI
errors on SPB (Clariion Proc B)

any guidance would be appreciated

_______________________
Jon Adams
Unix Web Engr.
jonathan.k.adams@saic.com
Tel: 202-285-4628
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:10 EDT