Summary: Cluster Performance

From: Loucks, Guy (Guy.Loucks@det.nsw.edu.au)
Date: Thu Aug 08 2002 - 04:24:56 EDT


My apologies for the delay, I was overseas.

The performance issue ended being a Auto Negotiation issue. Unlike the guide
recommends, the DE6xx cards (ES45 as opposed to TULIP's in DS-20's) do not
accept the firmware setting for FASTFD, and negotiate to HALF unless you set
them to AUTO and they negotiate correctly.

The cluster votes, is an issue of concern, and I do agree with Tom's comment
below, and what we are doing is making 2 members with votes and a quorum
disk with 1 vote.

*****NOTE TO HP ENGINEERING***** VMS Supports non zero / one quorum disk
votes, if we had this, we could survive with each member having votes. This
is a real life business issue, where we have a large computing environment
doing a lot of important business calculations, we want the cluster to
maintain itself, not operators 24*7.

Like I said, it is a feature enhancement, and check the nic's, the old layer
1 / 2 issue.

Cheers,

Guy

-----Original Message-----
From: Dr Thomas.Blinn@HP.com [mailto:tpb@doctor.zk3.dec.com]
Sent: Friday, 2 August 2002 12:24 AM
To: Loucks, Guy
Subject: Re: Cluster Performance

Guy,

Sorry for the delay in responding, I wanted to talk to my colleague
Bob Grosso who is a senior product manager responsible for TruCluster
software (among other things).

I described (I hope correctly) what you want to do, and we believe it is
ill-advised. Of course, I may have mis-understood some of the subtle
details or described them incorrectly to Bob, in which case we *might*
change our opinion.

The goal of TruCluster software is, among other things, to provide a single
system image including a cluster-wide file system, and to protect against
single points of failure when possible. So, you're supposed to set up
shared access (at least dual redundant) to all
of the critical storage, and use highly reliable cluster interconnects (the
Memory Channel has dual independent "rails" for a reason), and not expect
more than one component to fail. You seem to want to get the software to
more or less automatically cope with having two of the cluster member nodes
fail. Since you have a SAN, you can get to the disk farm from any member,
that is good, but you want to have a cluster that survives having two of the
three active members fail. It's just not in the design.

As to why there is a slow down (that you report) when you set up the nodes
to have no votes at all, with the only vote being the quorum disk, I am not
sure. I'd have to set up a configuration like that and start measuring
things like I/O rates to the quorum disk, stuff we don't normally do because
we don't expect to have people try to set up a system the way you describe.
And since I don't personally have access to the equipment to do the
experiment, I can't do it on my spare time, I'd have to try to borrow enough
equipment to fiddle around, and it's stuff that won't readily fit in my
office, I'd need lab space, and since we don't really expect it to work,
it's not too likely I could tin cup a setup to fool around.

So, our advice is have each of the three members have one vote and have
quorum set to 2 if you don't use a quorum disk, or have each of the members
have one vote and have the quorum disk have one vote and set the quorum to
3. This will prevent cluster partitioning. There is no scheme in which you
can lose the cluster interconnect and have device reservation protect you
against a partitioned cluster, while it will usually be safe, it's not
guaranteed, that's why for any kind of complex configuration, you want
highly reliable cluster inter- connect (which is why memory channel is
preferred, although it does have a high price).

All my opinions and interpretation of what Bob said, not necessarily a
definitive or technically exhaustive analysis. You might be able to get it
to work most of the time, but if things go wrong, you can lose data (if your
file systems get trashed because the cluster got partitioned because the
quorum disk wasn't really enough to save your bacon).

Tom

> Take 3, we will see if this beloved e-mail server accepts this, this
> time....
>
>
> Hello People,
>
> We are observing some interesting behaviour. Tom, Alan, anyone close
> to the Tru Cluster developers, input is welcome.
>
> We now have a 3 node cluster:
>
> Node1 1 Vote DS-20
> Node2 1 Vote DS-20
> Node3 1 Vote ES-45
> Quorum 1 Vote
>
> Using a LAN CI on 100BT, LAN utilisation <20Mb/s.
>
> Public interface DEGPA, lightly used as well. All disks in a SAN,
> currently being dual ported between independent 1GB and 2GB fabrics.
>
> What we have observed, if we adjust the votes, and take votes away
> from Node1, 2, or 3, that node starts to run REALLY SLOW. No
> explanation.
>
> What we desire is a cluster which does not depend on any of the nodes,
> and is valid with any one node and the Quorum disk.
>
> We are trying to verify if the following would be valid:
>
> Node1 0 Vote
> Node2 0 Vote
> Node3 0 Vote
> ...
> Node(n) 0 Vote
> Quorum 1 Vote
>
> Things to consider, if the SAN fabric stay up, and the LANC (Cluster
> Interconnect) goes away do we end up with a split cluster with the
> members still running, or is the quorum access sophisticated enough,
> and the SCSI PR prevent this from happening.
>
> Also does anyone have an idea, why if we take votes away the cluster
> member goes into go slow mode. Actual execution is fine, it is
> instantiation which seems to be the killer. The first to do an ls, ps,
> or df, etc can take 5-15 seconds on a unloaded dual 1GHZ ES-45 with
> 4GB RAM!
>
> I will summarise.
>
> Cheers,
>
> Guy
>
>
> > Guy R. Loucks
> > Senior Unix Systems Administrator
> > Networks Branch
> > NSW Department of Education & Training
> > Information Technology Bureau
> > Direct +61 2 9942 9887
> > Fax +61 2 9942 9600
> > Mobile +61 (0)429 041 186
> > Email guy.loucks@det.nsw.edu.au
>

Tom

   Dr. Thomas P. Blinn + Tru64 UNIX Software + Hewlett-Packard Company
 Internet: tpb@zk3.dec.com, thomas.blinn@compaq.com, thomas.blinn@hp.com
  110 Spit Brook Road, MS ZKO3-2/W17 Nashua, New Hampshire 03062-2698
   Technology Partnership Engineering Phone: (603) 884-0646
     ACM Member: tpblinn@acm.org PC@Home: tom@felines.mv.net

  Worry kills more people than work because more people worry than work.

      Keep your stick on the ice. -- Steve Smith ("Red Green")

     My favorite palindrome is: Satan, oscillate my metallic sonatas.
                                -- Phil Agre, pagre@alpha.oac.ucla.edu

     Yesterday it worked / Today it is not working / UNIX is like that
                        -- apologies to Margaret Segall

  Opinions expressed herein are my own, and do not necessarily represent
  those of my employer or anyone else, living or dead, real or imagined.
 



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:48 EDT