Strange cluster failure V4.0F

From: Wayne Blom (wayne_blom@email.com)
Date: Wed Jul 10 2002 - 02:54:04 EDT


Had a really strange one here. Our V4.0F cluster fired up the service on both members of the cluster simultaneously.

Situation. Active DS10 running the application, idle DS10 was rebooted in preparation for weekend manual failover. Done this 20 x or more. Never had a problem. This time...

Idle node goes down, restarts and during its joining into the cluster decides that it is now the primary and mounts the file system for the service and starts all the applications including the database app. Fantastic, except that the active DS10 was still ACTIVE!!!!

Needless to say we had database corruption etc etc, thankfully we were able to intervene soon enuf that the damage was minor.

We resolved the situation by shutting down the (original) idle node and rebooting the (original) active node. Later on during the weekend we ran a series of tests and could not get it to do the same again.

Investigations of the logs showed that the network was down between the two machines for a period of 20 seconds. This down time coincided with the machines deciding who was active and who wasn't.

Further investigation suggested that the network outage was caused by the ibm switches negotiating the link speed (even tho they are set at 100 half.)

Coincidentally we had a thunderstorm slam the neighbourhood moments after the idle node was kicked. There was a nearby strike at around the same time that systems got confused but nothing else was affected. The EMF from the lightening may have contributed but I am reliably reassured by our site people that the room is insulated from that sort of surge.

Either way, this failure should never be possible and should never have occurred. I am passing this on more for discussion than for help as we have already layed down some (more!) rules on the maintenance of the systems.

Wayne Blom
Systems Specialist
Mayne Group IT

E-mail: wayne.blom@au.faulding.com
S-mail: Building D, 75-83 Hardys Rd, Underdale SA 5032
Ear-mail: 0419808496

-- 
__________________________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup
Save up to $160 by signing up for NetZero Platinum Internet service.
http://www.netzero.net/?refcd=N2P0602NEP8


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:46 EDT