Netra T105 Disk Problem. Disk or Controller?

From: Bryan Guest, BMI Internet (bryan.guest@bmts.com)
Date: Fri Aug 08 2003 - 11:39:47 EDT


Hello Managers:

I have a Netra T1 105 that had two Sun18Gb disks in it. The disks were
mirrored using Disksuite.

> Ultrasparc II-i 360Mhz,
> 1024Mb RAM,
> Solaris 8, Generic_108528-17 32-bit
> OBP 3.10.24 1999/08/16 17:37 POST 1.15.0 1999/04/02 11:23

Recently it looked like I lost the entire second disk, so I detached the
failed mirrors, and cleared them. Then I pulled the failed disk and put in a
service spare.

Shortly after I recreated the mirrors the replacement disk failed. So, I
though maybe the controller is pooched. But the NetraT105 doesn't have a
replaceable controller, so before abandoning the box to the scrap heap I
thought I would try some things.

For all intents and purposes it appears that after a short period, (1-4 hours)
the second disk is going offline. I get errors like this:

Aug 8 10:24:41 kelly scsi: [ID 365881 kern.info] /pci@1f,0/pci@1,1/scsi@2
(glm0):
Aug 8 10:24:41 kelly Cmd (0x7047f100) dump for Target 1 Lun 0:
Aug 8 10:24:41 kelly scsi: [ID 365881 kern.info] /pci@1f,0/pci@1,1/scsi@2
(glm0):
Aug 8 10:24:41 kelly cdb=[ 0xa 0x0 0x0 0x0 0x7e 0x0 ]
Aug 8 10:24:41 kelly scsi: [ID 365881 kern.info] /pci@1f,0/pci@1,1/scsi@2
(glm0):
Aug 8 10:24:41 kelly pkt_flags=0x74000 pkt_statistics=0x61 pkt_state=0x7
Aug 8 10:24:41 kelly scsi: [ID 365881 kern.info] /pci@1f,0/pci@1,1/scsi@2
(glm0):
Aug 8 10:24:41 kelly pkt_scbp=0x0 cmd_flags=0x18e1
Aug 8 10:24:41 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2 (glm0):
Aug 8 10:24:41 kelly Disconnected tagged cmd(s) (1) timeout for Target 1.0
Aug 8 10:24:41 kelly genunix: [ID 408822 kern.info] NOTICE: glm0: fault
detected in device; service still available
Aug 8 10:24:41 kelly genunix: [ID 611667 kern.info] NOTICE: glm0:
Disconnected tagged cmd(s) (1) timeout for Target 1.0
Aug 8 10:24:41 kelly glm: [ID 401478 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6018]
Aug 8 10:24:41 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2 (glm0):
Aug 8 10:24:41 kelly got SCSI bus reset
Aug 8 10:24:41 kelly genunix: [ID 408822 kern.info] NOTICE: glm0: fault
detected in device; service still available
Aug 8 10:24:41 kelly genunix: [ID 611667 kern.info] NOTICE: glm0: got SCSI
bus reset
Aug 8 10:24:44 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:24:44 kelly disk not responding to selection
Aug 8 10:24:45 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:24:45 kelly disk not responding to selection
Aug 8 10:24:45 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:24:45 kelly disk not responding to selection
Aug 8 10:24:45 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:24:45 kelly disk not responding to selection
Aug 8 10:24:46 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:24:46 kelly disk not responding to selection
Aug 8 10:24:46 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:24:46 kelly disk not responding to selection
Aug 8 10:36:32 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:36:32 kelly offline
Aug 8 10:36:36 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:36:36 kelly disk not responding to selection
Aug 8 10:36:36 kelly scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@1,0 (sd1):
Aug 8 10:36:36 kelly offline

Often after these errors, the format command will show the box has only one
disk. But if I do a reconfigure boot (boot -r) the disk comes back.

I tried to reformat the drive, and the format proceeded through to the second
verify pattern stage. Then it failed again, because format thought the drive
had been removed.

So, has anyone come across this kind of problem with a Netra T1?
Are there steps or solutions I have overlooked?
Is there anyway I can test directly if the controller is faulty on this box?

As always any help is appreciated, and I will summarize to the list.

Bryan Guest
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:54 EDT