ES47 Cluster crash

From: rob.leadbeater@lynx.co.uk
Date: Tue Nov 23 2004 - 09:05:26 EST


Hi,

Could anyone possibly assist in diagnosing the cause of a problem we had this morning...
We have a two node ES47 memory channel cluster running 5.1B pk4 and Oracle 10g RAC Database.

Early this morning for some reason, node 2 stopped responding. SSH connections to the box wouldn't give a login prompt, neither would an FTP connection, although the server was responding OK to a ping.

On node 1, running clu_get_info showed that both nodes were up.
Trying to log on at the console of node 2 was fruitless. The mouse was working, but the login screen didn't accept any input.

Again on node 1, I could see nothing in /var/adm/messages or /var/adm/syslog.dated/current/* that showed any issues.
The only thing that was of note was that running a df -k locked up. I eventually managed to pinpoint this hang to an NFS mount point on a separate v5.1A box, however I could find nothing wrong with that machine...

Eventually I powered off node 2 however this also caused node 1 to crash for some reason. Eventually everything came back up, but I'm at a loss as to what caused the problem as I've not managed to find anything in any of the log files.

I've just tried using uerf to see if anything is logged there but I get a Memory fault.

Can anyone give me any pointers.

Cheers,

Rob Leadbeater

This message is intended only for the use of the person(s) ("The intended
Recipient(s)") to whom it is addressed. It may contain information which
is privileged and confidential within the meaning of applicable law. If
you are not the intended recipient, please contact the sender as soon as
possible. The views expressed in this communication are not necessarily
those held by LYNX Express Limited.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:12 EDT