New packet loss issue - x4100 tagged aggr interface

From: Andy Harrison (aharrison@gmail.com)
Date: Fri Apr 27 2007 - 10:23:48 EDT


I'm about to submit this as a Sun case, but I thought I'd post here as
well to see if anyone else has experienced this problem. Any ideas?

------------

The server is an x4100 M2 running Solaris 10 11/06 x86. Patches were
current as of 4/9/2007, including a reconfig reboot at that time.
Yesterday, the server was rebooted and afterwards the packet loss was
noticed immediately. Note that as part of troubleshooting this
problem, I've brought the patches up to date as of yesterday,
performed a reconfig reboot, as well as performing a cold boot (with
power cables completely removed).

The problem exists when using an aggregated interface, there is
approximately 50% packet loss unless we put the network interface in
promiscuous mode (by leaving the snoop command running).

This problem does not exist when I delete the aggregate and plumb
either of the e1000g interfaces directly. We have tried different
network cables and different ports on the switch. We were able to
test a different firmware version on the switch itself to make sure
there wasn't some sort of bug with switches firmware version.

'dladm show-aggr -L' clearly shows the aggregate interface sync up
immediately. This is verified when viewing the interfaces from the
switch side. While pinging the gateway ip from the server, we can
observe the port from the switch side and see the incoming and
outgoing frame count incrementing, but the server is not able to
receive the frames until the network interface is in promiscuous mode.

We have tried configuring the aggregate with LACP mode off, passive,
or active with the same results. At no time in our troubleshooting
did we receive any network interface errors on either the switch side
or the server side.

I have three other x4100 servers working perfectly with this
configuration of an aggregated, tagged network interface running LACP
in passive mode. One of these three servers was installed from the
exact same media that I used to install the server currently having
this problem.

-- 
Andy Harrison
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:41:54 EDT