Sun trunking

From: Johan Hartzenberg (jhartzen@csc.com)
Date: Sun Jun 04 2006 - 06:06:27 EDT


Hi,

I have an urgent problem trying to implement Sun trunking for maximum
incomming bandwidth into the server, and hopefully someone here can
assist. (It is urgent now because my project timeline is being threatened
by this problem)

The problem is with performance. In short, the setup is as follow:
The "backup servers" are Sun v440s with 4x 1.59 GHz CPUs and running
Solaris 10 rel 1/06. These each have a single Quad Gigaswift adapter
installed, and these adapters are configured with all four ports in a
trunk, using Sun Trunking v1.3. One of the onboard ce interfaces are used
as a normal access port, eg to be able to log on, but not used for the
bulk data transfers. The LACP mode on the trunk is set to active with
policy #2 four round-robin control of outgoing data.

The CISCO switch (WS-6509e) is set with the trunk in passive mode, and
indicates that the ports are in a trunk once the trunk is initialized on
the server. For what it is worth, the environment consists of two of
these switches with trunked 10 Gbps links between them. Each server is
connected to one of these switches. The "trunk" on each switch spans two
blades. All the clients (about 20 hosts) are connected by gigabit
ethernet to these same two switches. It is set up as a single subnet
(single VLAN). Any questions about more detail on the switches I will
have to refer to the network guys.

Note: These are going to be used as "backup media servers" with
netbackup, so the requirements is for maximum incomming bandwidth into the
server. Also, dladm won't work because of the ce interfaces, and ipmp
doesn't aggregate incomming bandwith. Therefore I am stuck with the Sun
Trunking software.

When doing tests, I monitor with nettr -stats 0 interval=5. My results
are as follow:
a) It appears that each client gets allocated one of the 4 trunk members
for its incomming packets based on some policy, but not based on load
because sometimes one port is seen to be fully used while the other three
ports remain unused. In other cases, each of the clients goes through a
separate port. It appeas as if each client always gets allocated a
specific port, unless I pull a cable, in which case it all shifts arround
again, ie the clients gets re-allocated to different ports. Maybe a MAC
hash or something similar. The response data (outgoing), eg the FTP
server replies is seen to be properly load balanced accross all the
members of the trunk. This is clearly controlled by the "policy" setting
on the trunk on the server side, as when I set a different policy, this
changes. Round robin (policy #2) appears to be the best, but clearly only
controls outgoing data, and this is also indicated in the man page and
documentation.

So the first problem is that the incomming data is not loadbalanced
accross the trunk members dynamically.

b) Then there appears to be a 1Gbps ceiling on the incomming data into the
server. Irrespecive of how many clients or how many ports are used for
data comming into the server, the total shown by nettr -stats never
exceeds about 1095 Mbps. With 4 gigabit ports, I would expect more, and
according to Sun I should get more. Generally, a single client manages to
send about 65% or so of 1 Gbps. Some clients manages more throughput, but
in total even with 4 or 5 clients pushing to the server at the same time,
the maximum incomming bandwidth is restricted somewhere. While testing,
there is about 92% Idle CPU time, and the nr of syscalls and
interrupts/sec shown by prstat -am is never very high.

Oh yes, for testing I use FTP from /dev/zero to /dev/null on the server,
pushing from each client.

I've tried to explain the setup and problem completely, but please feel
free to ask more questions, Thanx and much appreciated.
  _Johan
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:40:02 EDT