SUMMARY: Re: SVM + Cluster pain

From: Erek Adams (erek@theadamsfamily.net)
Date: Thu Jan 05 2006 - 15:33:11 EST


Thanks for the replies, they were appreciated. However, one thing that I
didn't manage to get across was that the server(s) would "hang" with this
error message on console and you couldn't continue past that point--Even
NodeB.

It seems that it's a known but rather obscure bug with Solaris 9.

Anyway, it's resolved:

1) Boot both nodes A + B into non cluster mode.
2) Comment out the shared md devices of both vfstabs
3) Reboot NodeA; once it's up, reboot NodeB
4) On NodeB:
        metaset -s <setname> -P -f
        (blank set should go be purged...)
        metaset
        (should be blank)
5) On NodeA:
        metaset -s <setname> -P -f
        (blank set should go be purged...)
        metaset
        (should be blank)
        (recreate the metaset from scratch)
        metaset -s <setname> -a -h NodeA NodeB
        metaset -s <setname> -a <diskpath0> <diskpath1> ... <diskpathN>
        metaset -s <setname> -a -m NodeA NodeB
        metaset
        (should show new set and ownership)
6) On NodeB:
        metaset -s
        (you should see the metaset you just created)
7) On both nodes:
        mount -a

In theory, you should be back up and happy with all data intact. But of
course, YMMV!

Cheers!

[Original post below]

On Wed, 4 Jan 2006, Erek Adams wrote:

> NodeA-v890
> NodeB-v890
> Shared-3510 AC
> SunCluster 3.1
>
> We lost power to the cluster--Don't ask, to ugly to tell... Before then
> everything was fine. Now, I get this weird problem. Bring up array,
> comes up fine. Bring up node1, boots, and starts to grab the array.
> Pauses a while then starts giving the following error:
>
> Jan 4 16:22:41 node1 Cluster.Framework: stderr: metaset:
> node1: ingdg: not owner of metadevice database
> Jan 4 16:22:41 node1 Cluster.Framework: stderr: metaset:
> node1: ingdg: must be owner of the set for this command
>
> Over and over....
>
> I've tried: pulling the heartbeats and booting only node1. It just
> flips the above messages over and over. If I try to boot node2, node2
> hangs on boot waiting on node1. I've killed the heartbeat between the two
> boxes, with no luck. I can boot node1 in non-cluster mode and I get the
> same error.
>
> >From what I've found, it seems that purging the metadb and then recreating
> it. I'm hoping that's not the fix... It just sounds
> unpleasant.
>
> Thoughts, ideas, suggestions?
>
> -----
> Erek Adams
> Nifty-Type-Guy
> TheAdamsFamily.Net
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
>

-----
Erek Adams
Nifty-Type-Guy
TheAdamsFamily.Net
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:38:22 EDT