TruCluster caa reason codes and application failover etc

From: David J. DeWolfe (sxdjd@ts.sois.alaska.edu)
Date: Sat Jan 24 2004 - 19:33:00 EST


All;

I've been looking through the various Tru64 and TruCluster 5.1b docs to see
if there is an elegant mechanism to determine if an application that is
being started by CAA (as a result of it's action script being executed) is
being started because the node that the application was previously running
on crashed.

Essentially, when an application resource action script runs to start an
application, can I determine if it's running due to a failure on the node
it was previously running on (i.e. it's failing over) versus running during
a normal startup of the application?

It looks like the caa reason codes (_CAA_REASON) are what I'm looking for
and my testing has shown that when one node crashes and an application is
failed over to another node the _CAA_REASON code is "unknown. Would I be
safe to assume the following about _CAA_REASON codes:

failure - likely that the node is fine but the application has crashed. The
docs seem to say this, but then again they only say that this is a "typical
condition that sets this value"

unknown - an application is being started on the node in question
potentially because the node it was running on crashed (which is what my
testing has revealed). The docs don't say this however, they only say
"contact your support representative".

It would be nice if the caa environment included the node the application
was previously running on and the state (offline versus online) which could
be used in conjunction with the reason code to make certain decisions at
startup time.

My environment is:

Dual GS1280's hardware partitioned in to 2 "nodes" each for a 4 node
cluster. Memory Channel CI
EVA 5000
Tru64 5.1b PK3

TIA, and I will summarize

David
mailto:sxdjd@ts.sois.alaska.edu



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:49 EDT