SUMMARY: fork, tee and a defunct process

From: Eiler, James A. (James.Eiler@alcoa.com)
Date: Wed Jun 04 2003 - 17:39:59 EDT


Received many responses on this issue - THANKS to:

Steve Feehan
Paul LaMadeleine
Martin Adolfsson
Tim Cutts
Dr. Thomas Blinn
Lucien Hercaud
Charles Ballowe
Thomas Sjolshagen
Joerg Bruehe

While several folks explained the details of what was happening and
why the defunct process persisted, it was Lucien Hercaud who suggested
I try using a named pipe instead of an unnamed pipe. And that,
basically, did the trick!

My original message immediately follows, and some of the responses
are appended at the end. (Probably too much data appended at the
end, but in case someone wants to search the archives.....)

THANKS AGAIN!

Jim

-----Original Message-----
From: Eiler, James A. [mailto:James.Eiler@alcoa.com]
Sent: Wednesday, May 28, 2003 11:11 AM
To: tru64-unix-managers@ornl.gov
Subject: fork, tee and a defunct process

Hi all,

I don't believe this is solely a Tru64 problem, but any insight would be
appreciated!

I've got a C program that forks. I run the program from a shell script and
pipe the output to a log file. A simplified version of the C code (hello.c)
is as follows:

#include <stdio.h>
#include <unistd.h>

main( )
{
   if( fork( ) == 0 ) {
      printf( "Entering child's code\n" );
      fflush( stdout );

      while( 1 ) {
         printf( "This is a test!\n" );
         fflush( stdout );
         sleep( 5 );
      }
   }
   else {
      printf( "Parent exiting, PID = %d\n", getpid( ) );
      exit( 0 );
   }
}

The command I'm issuing to get this running is:

# hello | tee hello.log&

Stuff gets logged into the log file as expected, but an unexpected "side
effect" is a defunct process:

# ps -ef | grep defunct
eilerja 34718 34734 0.0 - pts/2 0:00.00 <defunct>

When I look for the parent process, it looks like it's tee:

# ps -ef | grep 34734
eilerja 34718 34734 0.0 - pts/2 0:00.00 <defunct>
eilerja 34734 32901 0.0 11:05:48 pts/2 0:00.00 tee hello.log

Any suggestions on how to eliminate the defunct process?

THANKS!

Jim

-----Reply from Lucien Hercaud-----

It's in the SHELL and your way of programming it made it show.
In order to achieve the pipe the shell does this:

1. forks a process to run the command in the 2nd half of the pipe == tee
(call this P1)
2. opens a bidrectionnal pipe in P1
3. in P1, it forks a second process to run the command in the left side of
the pipe == hello (call it P2)
4. in P1 it closes the write side of the pipe
5. in P2 it closes the read side of the pipe
6. in P1 it execs "tee"
7. in P2 it execs "hello"

at this point P1 is the process father of P2, right ? although program "tee"
knows nothing about program "hello". This is the pipe principle.

Now, here is how the problem comes is : "hello" i.e. P2 forks to create P3
and then exits , but P1 is running "tee" and does not care about its orphan
children (no wait syscall) about which it does not even know (it's just not
coded to wait as its code ITSELF does no fork) So that is why P2 which
becomes a zombie.

If you want to achieve this kind of pipelines and also fork and also avoid
zombies, you can try to use named pipes, such as

% mknod /tmp/pipe$$ p

% tee < /tmp/pipe$$ hello.log

% hello >> /tmp/pipe$$

-----Reply from the good Dr. Tom-----

I can't tell you why "ps" is reporting that "tee" is the parent of the
defunct process, but I strongly suspect that it's not. "tee" never
forks (or execs) anything, at least, not that you can tell from looking
at the source (which I've just done for V5.1A, you provided absolutely
no indication of which version of system software you are running).

I also don't know why you think the defunct process is a problem. You
do need to understand that it's a process that can't be reaped because
some resource in the kernel is still extant. In this case, I have to
suspect it's related in some way to the way you're creating your "hello"
program's two forks, and I suspect that it's related in some way to the
way the shell you are using handles reaping child processes, since until
the child of your "hello" program exits and then "tee" exits, you've got
(probably) both the original fork (which is trying to exit) and the new
fork (which is mostly sleeping but occasionally wakes up and writes to
the pipe to tee so tee doesn't exit) and most shells don't really have
a good understanding of this stuff, so until "tee" goes away the shell
is probably not going to reap the exiting process, and until that process
is reaped, it probably looks like it belongs to "tee" (simply because
the scheduler needs to keep track of it somehow, and "tee"'s parent is
the shell and ultimately it's going to get reaped when "tee" exits and
gets reaped).

-----Reply from Martin Adolfsson-----

Doing a fork() without either letting the parent wait() for the child
process, (or if you wish to daemonify the process, doing it the "proper" way
(as described in Unix Network Programming Volume 1 [Stevens])) has a rather
nasty tendency to leave your system with zombie/defunct processes, as with
your code.

Without seeing the rest of your code, or knowing what the program is
supposed to do I really don't have any hard suggestions or code changes to
offer, except to use wait() if you can, daemonify the processes properly if
you don't need to input/output anything to std(out|in|err). (But you
probably do since you use tee)

Hope this made any sense at all. If it didn't or if you have further
questions, write back with more information.

(The defunct process doesn't happen when I run the 'bash' shell. (Which in
itself is a rather odd methinks :))

-----Reply from Charles Ballowe-----

The only thing that makes sense is that tee is holding stdout of the parent
open, as the pipe assigns stdout from your program to stdin of tee. Since
the child is fork()'d from the parent, it has a duplicate copy of the
file descriptors at the time of the fork().

I don't think what you're coming up to is a problem, per se. The process is
defunct because it's stdout stream is being held open and the kernel can't
clean up the structures used by the program until they are no longer in use.
The case of a parent exit()ing and leaving the child hanging like that seems
like an unintended method - usually the fork()/exit() method is used to turn
something into a daemon and most daemons don't rely on stdout (often they
have no controlling terminal). Personally, I'd have the program output to
a file and keep a tail -f running on the file to see the output in real
time.

-----Reply from Joerg Bruehe-----

Hi James!

"Eiler, James A." wrote:
>
> Hi all,
>
> I don't believe this is solely a Tru64 problem, but any insight would be
> appreciated!

IMO, the effect you see is the result of the general Unix process
handling principles and fully correct.

>
> I've got a C program that forks. I run the program from a shell script
and
> pipe the output to a log file. A simplified version of the C code
(hello.c)
> is as follows:
>
> #include <stdio.h>
> #include <unistd.h>
>
> main( )
> {
> if( fork( ) == 0 ) {
> printf( "Entering child's code\n" );
> fflush( stdout );
>
> while( 1 ) {
> printf( "This is a test!\n" );
> fflush( stdout );
> sleep( 5 );
> }

So the child process inherits "stdout" from the parent,
writes to that file in an infinite loop, and lives forever.

> }
> else {
> printf( "Parent exiting, PID = %d\n", getpid( ) );
> exit( 0 );
> }

The parent process terminates.

> }
>
> The command I'm issuing to get this running is:
>
> # hello | tee hello.log&

The pipe is the "stdin" for "tee" and also the "stdout" for both
the paraent and the child process executing the "hello" program file.

>
> Stuff gets logged into the log file as expected, but an unexpected "side
> effect" is a defunct process:
>
> # ps -ef | grep defunct
> eilerja 34718 34734 0.0 - pts/2 0:00.00 <defunct>

And probably your log file contains a line
   Parent exiting, PID = 34718

>
> When I look for the parent process, it looks like it's tee:
>
> # ps -ef | grep 34734
> eilerja 34718 34734 0.0 - pts/2 0:00.00 <defunct>
> eilerja 34734 32901 0.0 11:05:48 pts/2 0:00.00 tee hello.log
>
> Any suggestions on how to eliminate the defunct process?

You cannot - "tee" as the last command in the pipe is the parent
of all preceding ones (typical shell handling of pipes), and
it will probably not call "wait()" at all, at least not until its
"stdin" is exhausted. This will never happen, due to the infinite
loop in the "hello" child.

So the "hello" parent has terminated, but its parent ("tee") has not
yet fetched the exit code, and hence you have this zombie in the
process table.

As soon as the "hello" child terminates, the pipe is closed, and
"tee" in turn will terminate. Either it collected the exit code
of its child (34718), or not - in that case, "init" (1) will have
inherited that orphan, collect that exit code, and discard it.
In that moment, the process table slot will be freed.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:21 EDT