e6500 and network related problems

From: Robert Milkowski (milek@wp-sa.pl)
Date: Wed Jun 05 2002 - 08:07:24 EDT


Hi.
        I have a SUN E6500 with two I/O boards and one GbE (ge) in each.
System is Solaris 8 up-to-date.
These two interfaces are in one group and I'm using IPMP (second interface
is standby). I get in /var/adm/messages errors from in.mpathd like

   Cannot meet requested failure detection time of ### .....

really often. Sometimes it failovers :(

These two interfaces are set up in 1000FDX via /etc/system (with autoneg
problem still occurs) and switch (Cisco) is set up properly too.

I snoop icmp packets on ge0 and I can see outgoing icmp packets from my
host (in.mpathd) to router but sometimes I can't see any receiving
packets (answers) for some time (few to few dozen seconds) and then after
it I can see all those "lost" incoming packets. It looks like
sometimes system is hanging packets somewhere for a moment.

Using kstat I found that nocanput is incrasing so I incrased sq_max_size
and have no problem (mostly) with it. But

kstat -n ge0|grep ierror;kstat -n ge0|grep ge_queue_full_cnt
        ierrors 110776
        ge_queue_full_cnt 110776

As you can see those two are correlated.
I noticed that almost everytime I got problems with incoming packets those
two are incrasing.
I don't see any errors on Cisco. There're no any errors (excluding
in.mpathd) in /var/adm/messages.

Host is not overloaded and network traffic on ge is no more than 20Mb/s
(in+out).

I can't see any correlation between load on interface or system load, it
happens even in a morning where server is really on ligh load and there's
no more that 1.5Mbs of throughput.

Sometimes it works without problems for 2, 3 days, sometimes there's a
problem allmost every minute.

ps. of course it happens to all incoming packets not only icmp.

ps2. I switched to hme interfaces on IO boards for a few days but problem
did not vanish (but it occured definitly less frequent) - there were
ierrors too (and almost no nocanput arrres as sq_max_size was incrased).

It looks like some problems with stream queues or something...

any idea?

-- 
                                                Robert Milkowski
                                                rmilkowski@wp-sa.pl
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:24:26 EDT