Re: How to diagnose scsi/disk error on M80

From: Bill Verzal (BVerzal@KOMATSUNA.COM)
Date: Mon Jul 14 2003 - 11:49:28 EDT


If hdisk1 is the last internal disk on the channel, check the on-board SCSI
terminator/cable.
--------------------------------------------------------

Bill Verzal
AIX Administrator, Komatsu America
(847) 970-3726 - direct
(847) 970-4184 - fax

|---------+---------------------------->
| | "Green, Simon" |
| | <Simon.Green@EU.A|
| | LTRIA.COM> |
| | Sent by: IBM AIX |
| | Discussion List |
| | <aix-l@Princeton.|
| | EDU> |
| | |
| | |
| | 07/14/2003 10:25 |
| | AM |
| | Please respond to|
| | IBM AIX |
| | Discussion List |
| | |
|---------+---------------------------->
>-------------------------------------------------------------------------------------------------------------------------------|
  | |
  | To: aix-l@Princeton.EDU |
  | cc: |
  | Subject: How to diagnose scsi/disk error on M80 |
>-------------------------------------------------------------------------------------------------------------------------------|

One of our M80s is giving me a bit of trouble.

Last week, one of the internal SCSI disk started to act up: hardware
errors,
some LVM I/O errors and some stale partitions.
This was hdisk1, which was half of the mirrored rootvg, so no immediate
danger: I just dropped all the copies and removed it from rootvg,
preparatory to replacing it again.

I then realised that this was actually a replacement disk, installed the
week before after its predecessor exhibited similar problems.

Obviously I was concerned at that point that maybe it was actually a SCSI
problem, so I ran some diagnostics.

Diags on scsi1 showed no problems. Although the error log analysis showed
a
disk error, a certify of the disk was clean.

After running several certifies on the disk without problems, I put it back
into rootvg and re-built the mirrors, to see what would happen. That was
on
Wednesday.

For several days, nothing much happened. I had two block-relocations.
Then
this afternoon it failed again: this time it even got declared missing!

The ELA show a fault again, but certify says it's OK. Diags against scsi1
show no problems.

I'm running a format & certify of the disk, now. What else can I do? I'm
particularly interested in how I might confirm that the SCSI adapter is OK.
(Please note: I'm in England and the server's in Switzerland.)

I suppose this _could_ just be another faulty disk, but I'd be a lot
happier
if I could find something which showed me exactly what was broken.

Simon Green
Altria ITSC Europe Ltd

AIX-L Archive at http://marc.theaimsgroup.com/?l=aix-l&r=1&w=2
AIX FAQ at http://www.faqs.org/faqs/aix-faq/

N.B. Unsolicited email from vendors will not be appreciated.



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:17:01 EDT