SUMMARY: system unable to see device

From: Cohen, Andy (Andy.Cohen@cognex.com)
Date: Tue Jun 11 2002 - 11:29:20 EDT


Hi,

Well it turns out it was just a bad disk and all along I thought it was my
fault ;-). I switched the faulty drive with a functioning one and the
functioning one continued to work and the faulty failed so we were able to
surmise it was the disk and not the slot or controller or software.

Here were some helpful suggestions:
.....................................................................
You need to run MAKEDEV to create the device
.....................................................................
Try using "dsfmgr -k" command to create device file automatically.
.....................................................................
In the "cam" subsystem in /etc/sysconfigtab (you may not have any
entries there), there is an entry for V5.x that can be used to turn
on the display of disk names at boot time:

cam:
        cam_bootmsgs = 1

If you turn this on, you should be able to see whether your disk
is getting identified at all at boot time.

Verify that in the console the "missing" drive is seen as DKC600
(target 6 on bus 2) -- because that's what it probably is unless
you are using wide IDs. The SCSI controller itself is usually ID
7. (Ah, yes, you said that in the mail, I see it now..)

In the running kernel, you can use the "scu show edt" command to
see what SCSI devices are known to the kernel, and "scu scan edt"
to rescan the busses. "scu scan edt bus 2" would cause it to try
to rescan bus 2, which is where you say the missing disk should be
seen.

In the disk naming, an "RZ2xx" is in theory an Ultra2 drive, and
an "RZ1xx" is an Ultra1 drive, the older naming scheme (RZ29B) is
for fast but not Ultra drives. Sometimes an Ultra2 drive is sold
with the Ultra1 labelling; you get a potentially faster drive, it
should just work, if the bus and cabling are capable. If you've
got a controller that's only Ultra capable, the drive negotiates
to the slower speed. I *think* the "isp" (originally Qlogic, I'm
not sure who makes the chips most recently) controllers are only
Ultra capable, not Ultra2. If so, then the drive should run at
the slower Ultra speed. But if the negotiation with the drive is
choosing Ultra2 speeds, but the rest of the subsystem (the shelf
and the "personality module" and the cabling) are not Ultra2, it
is possible that the kernel's interaction with the drive will not
work, and in that case, the kernel may not be able to see the
drive. The update to V5.1A PK2 may have changed the driver for
that SCSI controller in a way that allows it to negotiate too fast
a speed for the drive. Note that I'm guessing here -- I'd have to
go look at the patch kit contents to see if anything changed from
V5.1A PK1 that MIGHT have done this. And I'm guessing that it was
PK1 that was working before.

My guess is that there is a new driver, and it's negotiating Ultra2
speeds for the drive that's missing, and it used to only support
up to Ultra, and that the rest of the subsystem can't handle this
faster speed. But that's just a guess. The drive does report it
is Ultra2 firmware, and we know that has been sold in the past as
the replacement for "Ultra" drive models that went end of life.
..................................................................
It is possible that you aren't seeing device id 6 on the second Q-Logic
controller because the controller has probably been configured to have
SCSI id = 6. In my servers with multiple isp controllers, the first has
a SCSI ID of 7, the second is 6, etc because the controllers all need
unique ID's. Try moving the drive to a different, unused slot in the
storage shelf. If I'm right, moving it down 1 slot will change it's id
to 7 and it will be visible.
..................................................................
I am only guessing here, because I have no idea what things you can do
in the AlphaBIOS with that "isp" family of controllers, but if it is
anything like the Adaptec controllers I've worked with, there is a way
to configure the controller to NOT try to run the drive any faster than
a particular speed.

I would agree that this is messy, and you need to get formal support on
resolving it. At a minimum, get them to explain what's going on and
give you a crib sheet for how to re-do it if it gets "confused".
...............................................................
did you try the following command : hwmgr -scan scsi ???

Thanks to Tom Linden, Ashish Tripathi, Dr. Tom Blinn, Ralf Borowski, and
Henk Kalle.

Andy

ORIGINAL QUESTION
===============
Hi,

I've got a 4100 just upgraded to 5.1A PK2. It has two external storage
shelves for the disk drives. Each shelf has a separate controller. It
looks like the controllers are isp0 and isp1. All the drives in isp0 are
showing up fine (there are 5 RZ29B older drives in there). In isp1 all but
one (6 of 7) are showing up. isp1 has 4 RZ29B (older drives) showing up
fine, 1 RZ1ED (somewhat new 18 GB) and 2 RZ1FC (brand new 36 GB) showing up
fine. The RZ1ED is the one that is not showing up:

root@thor==> hwmgr -view devices
 HWID: Device Name Mfg Model Location
 
----------------------------------------------------------------------------

--
    3: /dev/scp_scsi
    4: /dev/kevm
   43: /dev/disk/floppy0c            3.5in floppy     fdi0-unit-0
   51: /dev/disk/cdrom0c    DEC      RRD45   (C) DEC  bus-0-targ-5-lun-0
   52: /dev/disk/dsk0c      DEC      RZ29B    (C) DEC bus-1-targ-0-lun-0
   53: /dev/disk/dsk1c      DEC      RZ29B    (C) DEC bus-1-targ-1-lun-0
   54: /dev/disk/dsk2c      DEC      RZ29B    (C) DEC bus-1-targ-2-lun-0
   55: /dev/disk/dsk3c      DEC      RZ29B    (C) DEC bus-1-targ-4-lun-0
   56: /dev/disk/dsk4c      DEC      RZ29B    (C) DEC bus-1-targ-5-lun-0
   57: /dev/disk/dsk5c      DEC      RZ29B    (C) DEC bus-2-targ-0-lun-0
   58: /dev/disk/dsk6c      DEC      RZ29B    (C) DEC bus-2-targ-1-lun-0
   59: /dev/disk/dsk7c      DEC      RZ29B    (C) DEC bus-2-targ-3-lun-0
   60: /dev/disk/dsk8c      DEC      RZ29B    (C) DEC bus-2-targ-5-lun-0
   62: /dev/ntape/tape0     DEC      TLZ09     (C)DEC bus-0-targ-0-lun-0
   64: /dev/disk/dsk10c     COMPAQ   BD0366459B       bus-2-targ-2-lun-0
   65: /dev/disk/dsk11c     COMPAQ   BD0366459B       bus-2-targ-4-lun-0
   66: /dev/dmapi/dmapi
the disk that isn't showing should be the /dev/disk/dsk9c drive (absent from
the above output).
If I go to /dev/disk and issue:
root@thor==> cd /dev/disk
root@thor==> ls -alF dsk*c
brw-------   1 root     system    19, 37 Mar 22 15:30 dsk0c
brw-------   1 root     system    19,217 May 31 13:19 dsk10c
brw-------   1 root     system    19,233 May 31 13:19 dsk11c
brw-------   1 root     system    19, 53 Mar 22 15:30 dsk1c
brw-------   1 root     system    19, 69 Mar 22 15:30 dsk2c
brw-------   1 root     system    19, 85 Mar 22 15:30 dsk3c
brw-------   1 root     system    19,101 Mar 22 15:30 dsk4c
brw-------   1 root     system    19,117 Mar 22 15:30 dsk5c
brw-------   1 root     system    19,133 Mar 22 15:30 dsk6c
brw-------   1 root     system    19,149 Mar 22 15:30 dsk7c
brw-------   1 root     system    19,165 Mar 22 15:30 dsk8c
brw-------   1 root     system    19,181 Mar 22 15:30 dsk9c
root@thor==> ls -alF dsk9?
brw-------   1 root     system    19,177 Mar 22 15:30 dsk9a
brw-------   1 root     system    19,179 Mar 22 15:30 dsk9b
brw-------   1 root     system    19,181 Mar 22 15:30 dsk9c
brw-------   1 root     system    19,183 Mar 22 15:30 dsk9d
brw-------   1 root     system    19,185 Mar 22 15:30 dsk9e
brw-------   1 root     system    19,187 Mar 22 15:30 dsk9f
brw-------   1 root     system    19,189 Mar 22 15:30 dsk9g
brw-------   1 root     system    19,191 Mar 22 15:30 dsk9h
dsk9 is present.
at the >>> prompt if I issue 'show dev' this drive shows up as dkc600 with a
name of RZ2ED (not RZ1ED).
I'm rather a novice with this disk stuff.  In the past whenever we've had
new drives put in I've alway been able to format them with diskconfig and
then work with AdvFS.  I've never had to troubleshoot this sort of problem
so I'm pretty lost.
Any help is most appreciated.
Andy
===================================================
Andy Cohen
Database Systems Administrator
COGNEX Corporation
(508) 650-3079  Fax: (508) 650-3337
andy.cohen@cognex.com   www.cognex.com
cell: (617) 470-0034
===================================================


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:43 EDT