Off-line it, or Detach it?

From: Mark A. Bialik (mbialik@infinityhealthcare.com)
Date: Wed May 22 2002 - 01:16:56 EDT


Hello:

I have a problem with a DiskSuite 4.2 mirror, and I'd like some advice
on how to tackle the problem. I have a few two-way mirrors. I recently
discovered that some of the sub-mirrors went into a
"Maintenance/Critical" state. One mirror is mounted as / and the other
/var

In each case, the failed sub-mirror is on the same disk. However, the
same disk also has another submirror which is working just fine, so I'm
guessing the disk may not actually be bad (then again, it could become a
problem).

I have included my metastat, metadb, and syslog output detailing the
errors at the bottom of this email. In each instance, the bad submirror
is on c2t1d0. The Metadb I also had on this disk is bad, but I've got
six other ones spread across two other controllers.

My question is this: What is my best approach? I can see three
options:

1) Reboot and hope the problem clears itself up :) Does this sctually
work sometimes?

2) Offline the submirrors and then "online" them. Since one of the
submirrors is for / I'm not exactly sure if this is a good idea. If it
matters, the problem disk is not the primary boot disk. Is this a good
option to try before breaking the root mirror and going through the
hassle?

3) Detach/Unmirror the root, reboot, edit the correct files, come up
unmirrored, slap in a new disk, etc.

Again, I'm not sure the disk is actually bad since another submirror is
OK. But there could be some bad sectors.

This is my first problem under DiskSuite in about two years, so I guess
I;ve been pretty lucky. It obviously saved my butt, and I don't want to
make matters worse by doing something stupid. Any help is greatly
appeciated. I have an hour of scheduled downtime starting in about 8
hours :)

Will summarize.

Thanks very much,
Mark

# metadb -i
        flags first blk block count
     a m p luo 16 1034
/dev/dsk/c0t0d0s7
     a p luo 16 1034
/dev/dsk/c0t1d0s7
     a p luo 16 1034
/dev/dsk/c0t2d0s7
     a p luo 16 1034
/dev/dsk/c1t0d0s7
     a p luo 16 1034
/dev/dsk/c1t1d0s7
     a p luo 16 1034
/dev/dsk/c1t2d0s7
      W p l 16 1034
/dev/dsk/c2t1d0s7

# metastat | more
d2: Mirror
    Submirror 0: d0
      State: Okay
    Submirror 1: d1
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 24578400 blocks
 
d0: Submirror of d2
    State: Okay
    Size: 24578400 blocks
    Stripe 0:
        Device Start Block Dbase State Hot Spare
        c2t0d0s0 0 No Okay
 
 
d1: Submirror of d2
    State: Needs maintenance
    Invoke: metareplace d2 c2t1d0s0 <new device>
    Size: 35549760 blocks
    Stripe 0:
        Device Start Block Dbase State Hot Spare
        c2t1d0s0 0 No Maintenance

d8: Mirror
    Submirror 0: d6
      State: Okay
    Submirror 1: d7
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 4097920 blocks
 
d6: Submirror of d8
    State: Okay
    Size: 4097920 blocks
    Stripe 0:
        Device Start Block Dbase State Hot Spare
        c2t0d0s3 0 No Okay
 
 
d7: Submirror of d8
    State: Needs maintenance
    Invoke: metareplace d8 c2t1d0s3 <new device>
    Size: 4097920 blocks
    Stripe 0:
        Device Start Block Dbase State Hot Spare
        c2t1d0s3 0 No Maintenance

May 9 08:57:22 emsdb3 scsi: [ID 107833 kern.warning] WARNING:
/pci@4,2000/scsi@1/sd@1,0 (sd46):
May 9 08:57:22 emsdb3 SCSI transport failed: reason 'incomplete':
retrying command
May 9 08:58:27 emsdb3 scsi: [ID 365881 kern.info] /pci@4,2000/scsi@1
(glm3):
May 9 08:58:27 emsdb3 Cmd (0x708fc320) dump for Target 1 Lun 0:

May 9 08:58:51 emsdb3 scsi: [ID 107833 kern.warning] WARNING:
/pci@4,2000/scsi@1/sd@1,0 (sd46):
May 9 08:58:51 emsdb3 Error for Command: write(10) Error
Level: Fatal
May 9 08:58:51 emsdb3 scsi: [ID 107833 kern.notice] Requested Block:
12028560 Error Block: 12028560
May 9 08:58:51 emsdb3 scsi: [ID 107833 kern.notice] Vendor:
SEAGATE Serial Number: 3AK0E8CY
May 9 08:58:51 emsdb3 scsi: [ID 107833 kern.notice] Sense Key: Not
Ready
May 9 08:58:51 emsdb3 scsi: [ID 107833 kern.notice] ASC: 0x4
(<vendor unique code 0x4>), ASCQ: 0x1, FRU: 0x2
May 9 08:58:51 emsdb3 md_stripe: [ID 641072 kern.warning] WARNING: md:
d1: write error on /dev/dsk/c2t1d0s0
May 9 08:58:56 emsdb3 md_mirror: [ID 104909 kern.warning] WARNING: md:
d7: /dev/dsk/c2t1d0s3 needs maintenance
May 9 08:58:56 emsdb3 md_mirror: [ID 104909 kern.warning] WARNING: md:
d1: /dev/dsk/c2t1d0s0 needs maintenance
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:24:21 EDT