SUMMARY: trucluster question

From: Steve Feehan (sfeehan@sbb.uvm.edu)
Date: Thu Aug 01 2002 - 16:29:33 EDT


I received two replies, both indicating that it is expected that one
node of the cluster would crash. I find this somewhat surprising (and
distressing) since I expected that the orphan node would wait patiently
until the cluster interconnect was restored, at which point it would
rejoin the cluster.

Steve

On Wed, Jul 31, 2002 at 03:05:40PM -0400, Steve Feehan wrote:
> I have just setup a two node trucluster (5.1a) on two DS10Ls w/
> a LAN interconnect.
>
> To see what happens when the LAN interconnect was broken, I unplugged
> the cable. Before disconnecting, I spread the file systems across the
> two nodes (ie. / and /var on member1, /usr on member2) just to make
> things interesting.
>
> I disconnected the cable and both systems appeard to hang, which is
> expected.
>
> After about two minutes one node came back online, with cfsmgr
> showing that it had taken over the other nodes file systems.
>
> The unexpected bit is that the other node had crashed. I switched over
> to the console to find it at the >>> prompt.
>
> So my two questions:
>
> 1. why did one of the members crash?
>
> 2. is there a way to reduce the timeout between the clusters
> separating, and a member taking over? And if so, is this a good
> idea or should I not mess with the defaults?
>
> Thanks.
>
> --
> Steve Feehan
> Unix Systems Administrator
> Structural Biology and Bioinformatics Group
> University of Vermont

-- 
Steve Feehan


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:48 EDT