IP Multipathing, Sun Cluster 3.1

From: Gary Chambers (gwc@ll.mit.edu)
Date: Wed Jan 14 2004 - 17:29:59 EST


All...

Installing a two-node cluster on two identically-configured Sunfire V240
systems running Solaris 9 (112233-11), bge0/bge1 physical connections,
bge2/bge3 cluster transports, defaultrouter (which, if it fails, brings
down the entire building (and then some), so it's irrelevant). We're
[seemingly] having no success configuring IP multipathing on the bge0/1
interfaces. Here's what we intend to have occur on failure:

1) if bge0 fails, bge1 will simply takeover.
2) if bge0 returns, it will resume duty as primary.
3) if bge1 fails (while bge0 continues to function), notify me(?).
4) if bge0 and bge1 BOTH fail (i.e. NIC failure), fail the cluster node.

Here's the relevant contents of the relevant files of one of the nodes:

/etc/hosts
~~~~~~~~~~
155.34.62.129 cad-c2 loghost
155.34.62.194 cad-c2-2
155.34.62.195 cad-c1-bge0-test
155.34.62.196 cad-c1-bge1-test
155.34.62.197 cad-c2-bge0-test
155.34.62.198 cad-c2-bge1-test

/etc/hostname.bge0
~~~~~~~~~~~~~~~~~~
cad-c2 group cad up \
addif cad-c2-bge0-test netmask + broadcast + \
deprecated -failover up

/etc/hostname.bge1
~~~~~~~~~~~~~~~~~~
cad-c2-2 group cad deprecated -failover standby up

This configuration INITIALLY appears to work. The abbreviated output of
scstat -i is:

IPMP Group: cad-c1 cad Online bge1 Standby
IPMP Group: cad-c1 cad Online bge0 Online

IPMP Group: cad-c2 cad Online bge1 Standby
IPMP Group: cad-c2 cad Online bge0 Online

But after about a minute, in.mpathd reports:

NIC failure detected on bge1 of group cad

At that point, "Standby" becomes "Offline" and it remains that way
until the system is once again rebooted.

I have discovered that there are available on the web numerous examples
of how to properly implement IP multipathing. Unfortunately, I can't
get any of them to work as advertised. This shouldn't be this
difficult, but I have somehow made it that way. We'll address and
eliminate the single points of failure before going to production.

I'd appreciate any help any of you can provide, and I'll throw a huge
SUMMARY party when I find a workable solution. Thanks very much!!

Gary Chambers

// -------------------------------------
// MIT Lincoln Laboratory / 781-981-0957
// Lexington, Massachusetts
// Nothing fancy and nothing Microsoft
// -------------------------------------
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:49 EDT