Ultra Enterprise 450 SCSI errors

From: Edward M. Corrado (ecorrado@hera.rider.edu)
Date: Tue Oct 14 2003 - 23:22:29 EDT


I have an Ultra Enterprise Server 450 running Solaris 8. Recently, I have
been receiving SCSI errors. On the system console they looked something
like this (see below for excerpts from /var/adm/messages):

   Oct 13 03:14:32 athena disk not responding to selection
   Oct 13 03:14:33 athena scsi: [ID 107833 kern.warning] WARNING:
     /pci@1f,4000/scsi@3/sd@1,0 (sd1):
   Oct 13 03:14:33 athena disk not responding to selection

I thought it might be a disk going bad so I found another disk and put
it in the machine so I can move the data over to the new disk. After I put
the new disk (but before removing the disk I thought was going bad) I
noticed that the disk identified in the SCSI errors has changed from the
one I thought was bad (sd1) to the one that I just put in (which I
attached as sd2). The console messages now look more like this:

   Oct 13 03:14:32 athena disk not responding to selection
   Oct 13 03:14:33 athena scsi: [ID 107833 kern.warning] WARNING:
       /pci@1f,4000/scsi@3/sd@2,0 (sd2):
   Oct 13 03:14:33 athena disk not responding to selection

I'm starting to think that the problem might not be the disk and that it
might be a bad controller or something else all together. Does anyone
have any ideas on what else I should take a look at? FWIW: The
second set of errors appeared while trying to copy the data from the disk
at sd1 to the disk at sd2. Excerpts from /var/adm/messages below:

As is normal practice, I will summarize for the list.

Edward Corrado

Sample Errors before putting in new disk:

Oct 14 15:52:20 athena scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Oct 14 15:52:20 athena Cmd (0x3468b98) dump for Target 1 Lun 0:
Oct 14 15:52:20 athena scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Oct 14 15:52:20 athena cdb=[ 0xa 0x0 0x0 0x10 0xa 0x0 ]
Oct 14 15:52:20 athena scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Oct 14 15:52:20 athena pkt_flags=0x4000 pkt_statistics=0x61 pkt_state=0x7
Oct 14 15:52:20 athena scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Oct 14 15:52:20 athena pkt_scbp=0x0 cmd_flags=0x18e1
Oct 14 15:52:20 athena scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3 (glm0):
Oct 14 15:52:20 athena Disconnected tagged cmd(s) (1) timeout for Target
1.0
Oct 14 15:52:20 athena genunix: [ID 408822 kern.info] NOTICE: glm0: fault
detected in device; service still available
Oct 14 15:52:20 athena genunix: [ID 611667 kern.info] NOTICE: glm0:
Disconnected tagged cmd(s) (1) timeout for Target 1.0
Oct 14 15:52:20 athena glm: [ID 401478 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6018]
Oct 14 15:52:21 athena scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3 (glm0):
Oct 14 15:52:21 athena got SCSI bus reset
Oct 14 15:52:21 athena genunix: [ID 408822 kern.info] NOTICE: glm0: fault
detected in device; service still available
Oct 14 15:52:21 athena genunix: [ID 611667 kern.info] NOTICE: glm0: got
SCSI bus reset
Oct 14 15:52:21 athena scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3/sd@1,0 (sd1):
Oct 14 15:52:21 athena SCSI transport failed: reason 'timeout': retrying
command

Errors after new drive installed:

Oct 14 22:55:42 athena scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Oct 14 22:55:42 athena Cmd (0x1c4a050) dump for Target 2 Lun 0:
Oct 14 22:55:42 athena scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Oct 14 22:55:42 athena cdb=[ 0x2a 0x0 0x0 0x11 0xd8 0x80 0x0 0x8
0x0 0x0 ]
Oct 14 22:55:42 athena scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Oct 14 22:55:42 athena pkt_flags=0x4000 pkt_statistics=0x61 pkt_state=0x7
Oct 14 22:55:42 athena scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Oct 14 22:55:42 athena pkt_scbp=0x0 cmd_flags=0x18e1
Oct 14 22:55:42 athena scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3 (glm0):
Oct 14 22:55:42 athena Disconnected tagged cmd(s) (1) timeout for Target
2.0
Oct 14 22:55:42 athena genunix: [ID 408822 kern.info] NOTICE: glm0: fault
detected in device; service still available
Oct 14 22:55:42 athena genunix: [ID 611667 kern.info] NOTICE: glm0:
Disconnected tagged cmd(s) (1) timeout for Target 2.0
Oct 14 22:55:42 athena glm: [ID 401478 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6018]
Oct 14 22:55:43 athena scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3 (glm0):
Oct 14 22:55:43 athena got SCSI bus reset
Oct 14 22:55:43 athena genunix: [ID 408822 kern.info] NOTICE: glm0: fault
detected in device; service still available
Oct 14 22:55:43 athena genunix: [ID 611667 kern.info] NOTICE: glm0: got
SCSI bus reset
Oct 14 22:55:43 athena scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3/sd@2,0 (sd2):
Oct 14 22:55:43 athena SCSI transport failed: reason 'reset': retrying
command
Oct 14 22:55:43 athena scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3/sd@2,0 (sd2):
Oct 14 22:55:43 athena SCSI transport failed: reason 'timeout': retrying
command
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:17 EDT