Production overnight crash - ase turned itself offgrep mailx /usr/local/sbin/*

From: System Manager - Wayne Blom (blomws@u-underdale2.faulding.com.au)
Date: Sun Jun 02 2002 - 04:29:17 EDT


Hi all,
We had this really encouraging event happen this morning on our gs140 V40-F
cluster.

Jun 2 01:10:23 u-underdale1 ASE: u-underdale2-alt Director Warning:
u-underdale1-alt reported a reservation failure for device /dev/rz131g,
service c-orion-prod
Jun 2 01:10:28 u-underdale1 ASE: u-underdale1-alt Agent ***ALERT: AM
reports stolen device reservation for /dev/rz131g on u-underdale1-alt!
Leaving ASE environment via 'rcmgr set ASE off' and reboot! The
administrator can return this member to the ASE via 'rcmgr set ASE on' and
'/sbin/init.d/asemember start'. See ASE Admin Guide.

Anybody have any idea about this?

We had to change rc.config so that ASE="on" (ie. rcmgr set ASE on) and
restart the ase services daemons then fail the service back to the correct
host.

I mean its great that ase looks after its own destiny so well but, I mean,
turning itself off just because someone stole the reservation for one of its
disks! Just takes the fun out of pressing the halt button...

Seriously, any suggestions useful. The only other info that could be
relevent is that the second node in the cluster (u-underdale2-alt) was
upgraded to patch kit 7 last weekend and we had some problems with rebooting
it. Eventually resolved as an invalid file in /dev. The upshot being that
the ase service had to be reconstructed for that node. The node that lost
the plot last night was due to be upgraded next weekend to patch kit 7.

Wayne Blom
Systems Specialist
F H Faulding & Co Limited

E-mail: wayne.blom@au.faulding.com
S-mail: Building D, 115 Sherriff St, Underdale SA 5032
Ear-mail: 0419808496

"Someday, we'll look back on this, laugh nervously and change the subject."



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:43 EDT