About netstat hangs...

From: Loic Domaigne (loic@ast.dfs.de)
Date: Wed May 05 2004 - 11:19:50 EDT


Dear Tru64 Managers,

I have noticed that many people were faced to the "netstat hangs"
problem. But I didn't found a clear answer regarding that issue.

I ran two weeks ago in that problem too, and here a summary of my
investigations (on Tru64 4.0F, but this might apply to newer Tru64 as
well, if I refer to the posts on this list).

If a process writes in a message queue in such way that it overflows the
queue (for instance, no receiver presents), then when the maximum of
oustanding messages is reached (40 on my system), netstat hangs. BTW,
not only netstat, but also program like lsof. As soon as the queue is
removed, netstat (resp. lsof) works fine again.

I believe, this is a Tru64 issue? Since, the only process that should be
eventually "punished" is the writer (if IPC_NOWAIT isn't passed to
msgflg, the writer should block). However, even if a message queue has
kernel persistence, I don't believe that other processes like netstat
should block too...

Below, you shall find overflowQ.c, a program that does nothing
but overflow a message queue, as well a description to reproduce the
problem (steps.txt).

You might known all of this already... But I felt it would be perhaps a
good idea to post that summary.

Cheers,
Loic.

------------------ overflowQ.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <errno.h>

#define MSGQ_KEY 0x1234
#define PERM 0660

struct mymsg {
  long mtype;
  char mtext[1];
};

int
main()
{
  int msg_id; /* message Queue id. */
  struct mymsg msg; /* message to send. */
  int nmsg; /* number of msg send. */
  int status; /* status returned by msgsnd(). */

  /*
   * Create Message Queue
   */
  if ( (msg_id = msgget (MSGQ_KEY, PERM | IPC_CREAT | IPC_EXCL)) == -1 ) {
    perror ("msgget: ");
    exit (EXIT_FAILURE);
  }
  /*
   * now write message one by one until Queue is full
   */
  msg.mtype = 1;
  msg.mtext[0] = 'A';
  nmsg = 0;
  do {
    status = msgsnd (msg_id, &msg, sizeof(msg.mtext), IPC_NOWAIT);
    nmsg++;
    printf ("Sent msg #%d\r", nmsg);
  }
  while ( status == 0 );
  
  if (errno != EAGAIN ) { /* eh? we didn't overflow? */
    perror ("msgsnd: ");
    exit (EXIT_FAILURE);
  }
  
  printf ("Queue full, sent %d messages\n", nmsg-1);
  exit (EXIT_SUCCESS);
}

------------------ steps.txt

===============================
Steps to reproduce the problem:
===============================

bash$ netstat
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 vdphr1.sshd wks4.33524 ESTABLISHED
tcp 0 0 localhost.3922 localhost.2301 TIME_WAIT
tcp 0 0 localhost.3923 localhost.2301 TIME_WAIT
...

bash$ overflowQ
Queue full, sent 40 messages

bash$ netstat [ hangs... ]

bash$ ipcs -qa
T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME
q 0 0x41003ec7 --rw------- root system root system 0 0 16384 3 13647 14:24:29 14:24:29 8:37:07
q 130 0x1234 --rw-rw---- loic vdp loic vdp 40 40 16384 13662 0 14:25:06 no-entry 14:25:06

removing the queue solve the problem:

bash$ ipcrm -q 130
bash$ netstat
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 vdphr1.sshd wks4.33524 ESTABLISHED
tcp 0 0 localhost.3939 localhost.2301 TIME_WAIT
tcp 0 0 localhost.3940 localhost.2301 TIME_WAIT
...

Note: there is no need to restart kloadsrv.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:58 EDT