Detecting failures in Sun Cluster 3.0

From: Joaquin Domenech (joaquin.domenech@m-centric.com)
Date: Wed Aug 20 2003 - 05:43:12 EDT


Hi,

backend running Sun Cluster 3.0 in a 2 node cluster for HA Oracle in
failover mode. In front of the cluster there is a Jboss application
server establishing a connection pool to Oracle. When switching the
Oracle service manually with 'scswitch' command the application server
keeps running perfectly. The problem comes when we try with a failure
condition (i.e. disconnect the nafo group), Oracle switch to the other
node but the Jboss connection pool in the front end takes about 90
minutes to recover. The workaround we are thinking is just to send a
signal to the front end when a failure condition occurs in the cluster
and execute a script to restart the connection pool. The question is, do
you know a good way to detect this failure condition in the cluster ?

I've been thinking in some kind of log analysis of messages file but,
how to be sure the take over is due to a failure condition and not
because of a manual switching ?

Any idea ?

Thanks in advance, i'll summarize.

Ximo.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:57 EDT