SUMMARY: Excessive Collisions

From: alan.nguyen@au.transport.bombardier.com
Date: Thu Mar 06 2003 - 01:16:48 EST


Hi All,

Firstly I'd like to thank for all the useful suggestions and to all who replied.
Most of the suggestions suggested to turn off the auto-negotiation and others
donnot worry about it as the percentage error rate is too low.
There was a response from Mark.Deiss@acs-inc.com I've found it invaluable , I
share it with you below:

Well, on any reasonably active network there will always be some amount of
collisions and retransmissions - it's the nature of the beast. Rather then
looking at an absolute collision count, error retransmission count, most LAN
engineers look at the errors based on the total amount of traffic. They are
basically looking at the percentage error rate to total transmission that is
occurring.

Now the rub - what is considered an acceptable error rate? It varies as you
would expect. I have seen numbers ranging from 5% to 10%. These numbers
will be heavily weighted per when the user community starts complaining that
they are dropping connects, interactive responses are sluggish, etc.

As far as reducing errors, LAN engineers tend towards partitioning the
segments. They would review what are the major sender and receiver nodes.
The nodes may be given a dedicated segment or have their route further
optimized to hop through less congested segments. They may identify
"improper" traffic content and modify firewall rules to block the content.

Another approach is to use more switches over hubs as this reduces
"unnecessary" cross traffic.

There may also be a NIC card in the network that is running in broadcast
mode (i.e. hardware is failing and it is spewing out garbage). And of course
in a DHCP environment, there may be inadequate server table settings so a
fixed address system's IP address is being made available to a DHCP client -
you can guess what happens to traffic involving both systems.

Duplicated IP addresses, incorrect netmasks, bad routing tables can all lead
to reduced network performance. If you have systems with multiple NIC cards
in use, routing can also get a little tricky - inbound packets received by
one NIC card may have outbound packets for the same client going out the
other NIC card.

Another possible problem is if your NIC card is set to auto-negotiate the
settings - quite often this does not work very well with the upstream router
and your box auto-negotiates itself into inferior settings. Whenever you
call into any vendor support group, the first thing they will recommend is
find out what settings the router uses (i.e. 100 ethernet, full duplex etc)
and reset the system's NIC card to these values - and turn off
autonegotiate. Router configurations do not change that much to justify
setting any system to autonegotiate.

You may want to investigate "MRTG" and/or Big Brother/LARRD as a means of
monitoring your network performance over an extended period of time. You may
find that the error rates are spiking during very busy congested periods of
time and are otherwise quiescent during rest of the time.

Sites that are running system backups across the network
(Veritas/Legato/rdumps) can really drag down network performance. If your
trending indicates this is a problem, then you may want to re arrange the
network backup schedule to spread out the traffic load or move the traffic
to dedicated segments etc.

Looking at your aggregate metrics, you have a worst case error rate of about
10 in 4,294 - I would consider this acceptable. Others may weigh in
differently. Note though the importance of trending - the error rate would
not be acceptable if all the errors are occurring in a very short time
frame. i.e. Assume you are looking at a total aggregate time window of 2
weeks for the 4294967294 count - but the 9989348 errors may have occurred
all in the time window of 8:00 to 9:00 when your payroll group is struggling
to get the paychecks processed. Not good.

And lastly, there are some rather obscure settings in your TCP/IP stack that
may be causing problems with transmission quality to select sites. Some
sites find it beneficial to adjust their TCP/IP packet size - this would be
based on the kind of traffic they normally handle. There are also some weird
settings that our VMS group ran into massive headaches over. Massive. It
took months working with multiple vendors to track down an old BSD
compliance flag that was causing the packet pointers to become scrambled.
Large ftp jobs would die because the headers would eventually become corrupt
and break the transmission.

-----Original Message-----
From: alan.nguyen@au.transport.bombardier.com
[mailto:alan.nguyen@au.transport.bombardier.com]
Sent: Monday, February 24, 2003 10:42 PM
To: tru64-unix-managers@ornl.gov
Subject: Excessive Collisions

Hi,

When the netstat -Itu0 -s is run, it shows quite a number of send failures,
what
is the cause and how to fix it ?
Any help will be appreciated.

tu0 Ethernet counters at Tue Feb 25 13:34:52 2003

       65535 seconds since last zeroed
  4294967263 bytes received
  4294967294 bytes sent
   834506685 data blocks received
   422647837 data blocks sent
  4294967274 multicast bytes received
   315769449 multicast blocks received
   218529925 multicast bytes sent
     1775071 multicast blocks sent
     8838555 blocks sent, initially deferred
     4317983 blocks sent, single collision
     9989348 blocks sent, multiple collisions
       49997 send failures, reasons include:
                Excessive collisions
           0 collision detect check failure
           0 receive failures
           0 unrecognized frame destination
           0 data overruns
           0 system buffer unavailable
           0 user buffer unavailable

Alan.Nguyen@au.transport.bombardier.com



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:09 EDT