SDS RAID 5 problems after upgrading from 2.7 to 9

From: justin (clancyj@tiscali.co.uk)
Date: Wed Mar 24 2004 - 09:20:03 EST


This is a good one (well, it's got me stumped!)

This relates to a system (220R) which has 2x internal 9G disks and a
diskpack containing 6x 9G disks - it's quite old. It was working fine
running 2.7 and supplied home directories via NFS for about 100 users.
The diskpack was configured as a RAID 5 metadevice with the command:

    metainit d100 -r c1t9d0s0 ..(blah blah).. c1t14d0s0

As I said, this was all working fine until the powers-that-be decided to
upgrade all our systems to Solaris 9. So I backed up the 40 odd Gig to
another array which also uses ancient 9G disks and, of course, I had the
tape backups from Veritas NB to fall back on if needed. You can see
this coming, can't you?

Now, I don't like using the "upgrade" option for Solaris because of bad
experiences before - but that's another story, so I opted for "New
Installation" to make a clean sweep and use *another* backup of /etc etc
(sorry) to restore all those fiddly bits like /etc/passwd and so on.

The installation went fine.

The restore of "fiddly bits" went fine.

Then I tried to bring the RAID 5 back to life. I created the metadb
stuff (on separate partitions) then ran the command:

    metainit d100 -r c1t9d0s0 ..(blah blah).. c1t14d0s0 -k
                                                                         
   Note:^

The system said "fine - RAID created", so I tried to mount it on /mnt
and I get a "bad magic number" error. Uh, oh!

I check the disk backup; disk failure in the diskpack; was setup as RAID
0; buggered! This was the point when I started to get a sense of
impending doom. Check NB with my colleague: "Sorry, mate. I was doing
an upgrade to NB and they all failed. It's this rotten L3500 - the
drives fail, the robot arm is wonky, it never writes, it never
calls....." I leave, to avoid strangling him, largely because it's MY
fault for not checking. How many times do I have to tell myself that
you can never be too paranoid in this job?

So now I have NO backups and the metadevice has some serious
problem(s). I haven't touched the disk contents or mounted it (hell, I
*can't* mount it).

I can't even use fsdb_ufs to work ancient and arcane evil to try to
correct the problem 'cos it crashes with an arithmetical error. Even
the tried and trusted techniques involving black candles, pentagrams and
cockerel sacrifices haven't worked (a first, in my experience).

I'm so desperate that I'm trying to locate the source for fsdb so I can
work around the arithmetic bug. I'm so desperate that I called the Sun
helpdesk (ha, ha!) I'm sooooo desperate that I'm writing to you guys.

Anyone got any ideas?

  Regards,

    Justin.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:28:20 EDT