Sun Cluster 3.2 problems...2nd attempt

From: Alan.Rubin@nt.gov.au
Date: Thu May 24 2007 - 10:00:10 EDT


Something stripped all of the content out of my first attempt. Here's hoping
the second time works better...

To: sunmanagers@sunmanagers.org
From: Alan Rubin/DCIS/NTG
Date: 05/24/2007 11:12PM
Subject: Sun Cluster 3.2 problems

Hi,

We have setup a two-node cluster on two v240s running SC 3.2 and Solaris 10
11/06 with a Recommended Patch set from earlier this May.

The cluster was created, we have two interconnects using crossover cables on
bge2 and bge3, one 60GB LUN from an IBM SAN setup as a quorum device and for
data, bge0 is on our public LAN, and one resource group running Oracle 10gR2 and
there are no relavent errors in the logs or from the output of 'sccheck'. I set
this up and got the resource group running and started testing failover. We did
manual switches a few times without problems. Then we tried to simulate
something more random and rebooted the primary cluster node. The resource group
seemed to failover cleanly; however, when I tried to switch it back to the first
node (which had come up again), we got the error "the resource group is
undergoing a reconfiguration, please try again later." I tried to do a quiesce
but that just sat there and seemed to do nothing; ended up killing the process.
The individual resources began to show up as failed or in various error states.
Eventually I was able to get those cleared and everything marked as offline, but
when I try to online the group or switch, I get the same "the resource group is
undergoing a reconfiguration, please try again later." Even after doing a
complete shutdown and boot up of the entire cluster. I also mounted the disk
and started the database by hand and it all seemed well so the issue seems to be
cluster related, but I can not find any more detailed errors or more information
from google or docs.sun.com. Anyone else have any pointers on where to look on
the cluster or any other suggestions?

A second problem that concerns me, although it isn't strictly as important, is
that 'cluster shutdown' does not work despite the configuration and the physical
setup seeming to be ok. I issued the command and it broadcast a message on both
nodes that it would shut down in 60 seconds, but did nothing else. Any ideas?

Thanks,

Alan Rubin
Technician Unix
DCS Midrange Services
Phone: +61 (08) 8999 6814
Fax: +61 (08) 8999 7493 e-Mail: alan.rubin@nt.gov.au
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:41:59 EDT