SUMMARY: NFS3 server not responding & ee0: Transmit ring is full

From: Kjell Andresen (kjell@dod.no)
Date: Mon Dec 16 2002 - 08:24:11 EST


Kjell Andresen <kjell.andresen@usit.uio.no> writes:

Original question was:

> Our 2 cpu DS20E is quite busy running calculations, but is not swapping.
> Other workstations and servers is now and then claiming
> NFS3 server hox not responding still trying
>
> I can't really seen any reason for this, but on the other hand
> I'm not sure exactly what the load and swapping is like under
> this circumstances.
> In /var/adm/messages there is a lot of these:
>
> Dec 9 15:43:46 hox vmunix: ee0: Transmit ring is full
> Dec 9 16:42:04 hox vmunix: ee0: Receiver Not Ready interrupt

Thanks to:
 "O'Brien, Pat" <pobrien@mitidata.com>
 Paolo Lucente <pl@ba.cnr.it>
 "Rost, Werner" <Werner.Rost@zfboge.com>
 "Gergen, Peter" <petergergen@kpmg.com.au>
 Selden E Ball Jr <SEB@LNS62.LNS.CORNELL.EDU>
 "Johan Brusche" <johan.brusche@skynet.be>

All who tells me about network problems and the real nature of
autonegotiation..

The case is not completely closed.
So far these suggestions have come and has been implemented:

 "Johan Brusche" <johan.brusche@skynet.be>
-----------------------------------------------------------------
> You could increase tcbhashsize (from the default 512)to reduce the
> connection block lookup times.
> /sbin/sysconfig -r inet tcbhashsize=2048
>
> To make sure there are no autonegotiation problems, try testing with
> fixed settings (eg:100Mb/FDX), with the help of the lan_config command.

Thank! I'l try that.
Moreover I've identified one other machine connected erroneus to
the network and this has caused a lot of NFS error messages.
They are reduced but has not come to an end.
Indicates user i/o I guess.

Fixed FD is set and switch switched, the other switches connected to
the main gbps-switch (world) has got each two parallell 100 mbps
connections.

Selden E Ball Jr <SEB@LNS62.LNS.CORNELL.EDU> summararizes:
-----------------------------------------------------------------
It sounds to me like you may have several problems.

The "Receiver Not Ready interrupt" message means that it has run out of
network input buffers and has to allocate more because there still are
network packets arriving.

This suggests that your CPU bound jobs are keeping the network services
from getting enough time. Make sure that the jobs are "niced" and
running at a low priority.

Your symptoms also are consistant with network problems:
Your system's output ring buffer could be full because the I/O
driver can't write packets out to the network fast enough.

This suggests that you might have the common problem of the duplexity
not being set to the same value on both your host adaptor and
the switch port (assuming you're using a switch and not a hub:
connections to hubs must always be half-duplex).

Alternatively, there may be some other system on the same switch or hub
that is monopolizing it.

-----------------------------------------------------------------
"Gergen, Peter" <petergergen@kpmg.com.au> gave me a couple of commands
to check:

hox /# hwmgr -show comp | grep -e ee0 -e ee1
   12: hox r---- none ee0
hox /# hwmgr -g att -id 12
12:
  name = ee0
  category = network
  sub_category = Ethernet
  model = Intel 82559
  hardware_rev = 8
  firmware_rev =
  MAC_address = 00-50-8B-AD-D5-CC
  MTU_size = 1500
  media_speed = 100
  media_selection = Automatic
  media_type = Unshielded Twisted Pair (UTP)
  loopback_mode = 0
  promiscuous_mode = 0
  full_duplex = 1

Regards,
Kjell Andresen Systems administrator, University of Oslo, Norway
                Center for Information Technology Services and
                Department of Geophysics



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:02 EDT