Re: HACMP Question

From: Green, Simon (SGreen@KRAFTEUROPE.COM)
Date: Mon Oct 21 2002 - 05:55:25 EDT


I would agree that the likelihood of a failure of multiple networks or
TCP/IP itself is small. However, it's not zero and the consequences
could be fairly grim.

With two systems trying to acquire the same resources you're likely to end
up in with a horrible mess and a prolonged outage to your system whilst
you sort it out. (I managed something like this myself recently and it
took about two hours to fix.) Presumably the use of HACMP in the first
place means that an outage of that length is undesirable.

The most likely cause of problems to multiple networks is damage to cables,
(which could, of course, include the serial cable!), or a single point of
failure such as a router, or power supply to a network room. Provided
these issues have been addressed and there are at least two completely
separate networks available - not simply a standby adapter - it should be
fairly safe.

Use of a serial link is a solution to one particular problem. Provided
that problem has been addressed in some other way then one could do without
the serial link. But it _must_ be considered properly.

On your final point, Bill: desirability of takeover following TCP/IP
failure. I think that the problem is that it's not easy to tell the
difference between a subsystem failure and a network infrastructure
failure. Also difficult to tell which end the failure has occurred at:
maybe the TCP/IP subsystem on the Standby has failed.

For a subsystem failure, I don't think I would want a takeover if the rest
of the system was still running. It would probably be quicker to re-start
TCP/IP than to complete a takeover. One of those situations which is
complicated enough to need human intervention.

Simon Green
Philip Morris ITSC Europe

AIX-L Archive at http://marc.theaimsgroup.com/?l=aix-l&r=1&w=2
AIX FAQ at http://www.faqs.org/faqs/aix-faq/

N.B. Unsolicited email from vendors will seldom be appreciated.

> From: Bill Thompson
> Sent: 18 October 2002 18:11
>
> Regarding the concern about not having a serial connection.
> I've come to think if you're network is configured correctly it's not as
> important as it used to be.
<snip>
> The only way heartbeats could not get through would be for multiple
> network failures or for the TCP/IP subsystem to fail.
> The first is highly improbable (we've eliminated a single point of
> failure).
> The second is perhaps equally improbable - when's the last time the
> TCP/IP subsystem failed on an AIX machined? But, even if it did fail on
> one machine, wouldn't you want the failover machine to take over?
<SNIP>
> I realize IBM *highly* recommends a secondary connection, but I
> wouldn't let that be my deciding factor.



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:16 EDT