SUMMARY: HELP - Confused CAA ?!?

From: Ballowe, Charles (CBallowe@usg.com)
Date: Mon Sep 16 2002 - 12:31:51 EDT


I managed to unconfuse CAA by restarting the caad on the good
node (/sbin/init.d/clu_caa stop ; /sbin/init.d/clu_caa start).
Then starting it on the bad node.

Seems that the good node was still waiting for a response from
the crashed node (caad had a tcp connection open and was waiting
on it). I believe this is what was preventing caad from starting
on the node that had crashed. Out of curiosity, is caad only
capable of doing one thing at a time such that if it gets hung
telling one node to do something, it can't take requests from
other systems etc?

-charlie

-----Original Message-----
From: cballowe@usg.com [mailto:cballowe@usg.com]
To: tru64-unix-managers@ornl.gov
Subject: HELP - Confused CAA ?!?

Man - friday the thirteenth didn't go over well. And saturday
wasn't much better.

One of my systems in a 2 GS80 cluster crashed just as a
caa_stop of the oracle instance was issued on one of the other
cluster members. This left caa claiming STATUS: ONLINE
TARGET: OFFLINE. caa_stop -f of the service tells me
"Resource or relatives are currently involved with another operation"
and a caa_stat run on the member that crashed gives me
"Cannot communicate with the CAA daemon." caad -1 doesn't
fix that like a manual says it should.

Is there any way to get CAA to believe the service is down?
What can I do to get caad responding on the other cluster
member?

-charlie

Charles Ballowe /"\
Unix System Administrator \ / ASCII Ribbon Campaign
cballowe@usg.com X Against HTML Mail
x3896 / \



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:53 EDT