SUMMARY: How to break CLOSE_WAIT

From: Bhavesh Shah (shah.bhavesh@gene.com)
Date: Tue Jan 31 2006 - 12:25:52 EST


Thanks to everyone for their detail explanation especially
Crist Clark
Eric Voisard
Casper Dik
Gordon Johnston
Hutin Bertrand

Explanation:
Crist Clark
------------

CLOSE_WAIT means that the local end of the connection has received
a FIN from the other end, but the OS is waiting for the program at the
local end to actually close its connection.

The problem is your program running on the local machine is not closing
the socket. It is not a TCP tuning issue. A connection can (and quite
correctly) stay in CLOSE_WAIT forever while the program holds the
connection open.

Once the local program closes the socket, the OS can send the FIN to
the remote end which transitions you to LAST_ACK while you wait for
the ACK of the FIN. Once that is received, the connection is finished
and drops from the connection table (if you're end is in CLOSE_WAIT
you do _not_ end up in the TIME_WAIT state).

Eric Voisard
-------------

Afaik, there is no ndd parameter which affects the tcp CLOSE_WAIT duration.
There was "tcp_close_wait_interval" but it has been obsoleted and renamed to
"tcp_time_wait_interval" because in reality it affects the TIME_WAIT timeout
and not the CLOSE_WAIT. So, you can try to change it but I doubt it'll have
any effect since they're different things...

Otoh, from what I know, it's the responsibility of an application (i.e. not
to the OS) to close its socket once the remote computer closes its side of
the TCP communication.
RF793 says CLOSE_WAIT is the TCP/IP stack waiting for the local application
to release the socket. So, it hangs because it has received the information
that the remote host has initiated a disconnection and is closing its
socket, upon what the local application did not close its own side.

So maybe the solution consists in finding a bug fix for your application...

Or more dangerously because they still have right to send remaining data in
queue, to kill processes in CLOSE_WAIT state...

Casper Dik
-----------

CLOSE_WAIT connections indicate an error in the software.

It's a connection which has been torn down but your side of things
still has a filedescriptor open.

Gordon Johnston
-------------------
I believe CLOSE_WAIT on the server side of the connection means that the
server has received a FIN from the client, will have acknowledged this
back to the client and then informed the application that it can close
the connection. It is then up to the application to relinquish the
connection once it is satisfied that all the data has been read from the
connection. Once it relinquishes the connection the server will send a
final FIN back to the client and the connection will be fully closed.

 If you are seeing a large number of connections persisting in
CLOSE_WAIT state it's probably a problem with the app itself, restarting
it will clear the connections temporarily but obviously further
investigation will be required to find the cause of the problem.

Hutin Bertrand
-----------------
look at :
http://docs.sun.com/app/docs/doc/806-7009

My original post was:

>Hi Gurus,
>When i perform netstat -a, i saw the hundreds of connections are in
>CLOSE_WAIT state. This causes my named-xfer using these connections to
>sleep, truss -p <process_pid>.
>Is there a timer to set, say after 120 seconds the CLOSE_WAIT
>connections will break so my program can reconnect again?? For example
>the "ndd" command??
>Any help is greatly appreciated.
>Best Regards
>shahb
>_______________________________________________
>sunmanagers mailing list
>sunmanagers@sunmanagers.org
>http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:38:47 EDT