Lost System Drive(s) on Alpha2100 in standalone Raid Array 450 v 5.2 with Version 5.0A of Tru64

From: Centifonti, Mark (mark.centifonti@ag.state.nj.us)
Date: Mon Mar 03 2003 - 15:58:17 EST


Hi again Managers:

I received 2 great responses from Jim Kurtenbach and Bruce D Hines,
regarding the issue below. I used Jim's solution(Bruce's was very
similar) and it worked! The "MIRSYS" was rebuilt with no problems and
the system rebooted. Thanks Gentlemen for your help, it was much
appreciated.

First: (delete failedset disk120)
The blinking should stop
second: (add spareset disk120)

If this works, the mirror should automatically start rebuilding.

OK, let's assume this doesnt work, i.e. it errors when you add it to the
spareset.
First: (delete disk120)
Second: remove the drive, wait 10-15 secs, reinsert the drive.
Third: (run config)
Fourth: try and init the drive (init disk120)
Fifth: (add spareset disk120)

Also, I beleive the clear this invalid_cache command needs another
parameter. Something like "destroy_unflushed_data" should be safe if the os
was down when the power was interrupted.

Finally, do a (show d1) to see if there are errors on the unit.

Good luck!

Also, the storageworks team at HPAQ is very good with this type of stuff.

> -----Original Message-----
> From: Centifonti, Mark [SMTP:mark.centifonti@ag.state.nj.us]
> Sent: Monday, March 03, 2003 9:34 AM
> To: Tru64 Mailing List
> Subject: Lost System Drive(s) on Alpha2100 in standalone Raid Array
> 450 v5 .2 with Version 5.0A of Tru64
>
> Hi Managers:
> Hopefully I can get some advice regarding the recovery of my system
> drives. The Raid Array 450 (where the drives are located)
> inadvertently had a power
> cut off. The operating system was shut down before the power was cut to
> the
> raid array however. Should have done a SHUTDOWN THIS_CONTROLLER first.
> This
> led to the cache module retaining unflushed data which was not written
> back
> to the disk(s), hence when I tried to reboot the system I got console
> message, "could not open DKB200". I tried doing a "RESTART
> THIS_CONTROLLER"
> to reset the controller and "CLEAR_ERRORS THIS_CONTROLLER INVALID_CACHE
> and
> CLEAR_ERRORS UNWRITABLE_DATA, but this did not work. I made the mistake
> of
> reseating the 2 mirrored system drives under LSM, only to have an amber
> light start blinking on 1 of them(1 of 2 in the mirror), and the
> controller
> setting 1 of the failed drives in "MIRSYS" to "FAILEDSET" status. The
> other
> 1 is ok. Is there any way to get the system drives back when they're in
> this
> status?
>
>
>
> The system drives are mirrored under LSM with 1 drive in container
> "MIRSYS" marked as part of a "FAILEDSET" (DISK120)and the other looks
> ok(no amber light blinking) (DISK220)(RZ29B-VA) 4.3G. Can I just
> remove the 1 with the blinking light from the mirror and reboot using
> (DISK120), or do I have to remove the mirror "UNMIRROR" entirely and
> try and reboot from the drive that
> looks ok and remirror after I get a new drive? I also have an exact drive
> (RZ29B-VA) that I can remove form an unused container to replace DISK120
> and
> not "UNMIRROR".
>
> I also have 2 other containers that were presented to the system but
> are not in use. They are RZ29B-VA 4.3G drives. I could use these if
> "MIRSYS" is not
> recoverable.
>
> In addition to the default system domains I have 3 oracle domain data
> drives that are fine and have no problem. The data drives are Raid 5
> with ADVFS file structure. (RZ1DF-VW) 9.1G.
>
>
>
> What would be the best course of action here? Thanks in advance for
> your help.
>
>
>
> P.S. If I 'm not totally clear on this situation I can supply further
> info if needed.

-------------------------------------
Mark Centifonti
Systems Administrator
NJ. Dept. of Agriculture
New Warren & Market St.
Trenton, NJ. 08625-330

Voice: 609.292.8825
Fax: 609.292.9549
-------------------------------------



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:09 EDT