SSH connection stays open while the remote command has finished

From: Sven Vermeulen (sven.vermeulen@siphos.be)
Date: Thu Oct 05 2006 - 04:58:45 EDT


When running a command remotely using ssh, the command finishes but
sometimes the SSH connection itself remains open. The command doesn't
launch anything in the background and does not take any interaction (it is
a script that performs a few chgrp and find commands).

A ptree doesn't show any child processes anymore (this is on the target
system):

root /root$ ptree 5975
26293 /usr/lib/ssh/sshd
  5968 /usr/lib/ssh/sshd
    5975 /usr/lib/ssh/sshd

PID 5975 is the SSH daemon instance that ran the script. As you can see,
the script itself has finished. I've ran truss against this instance:

root /root$ truss -faDe -p 5975
5975: psargs: /usr/lib/ssh/sshd
5975: pollsys(0xFFBFF200, 3, 0x00000000, 0x00000000) (sleeping...)

root /root$ truss -faDev pollsys -p 5975
5975: psargs: /usr/lib/ssh/sshd
5975: pollsys(0xFFBFF200, 3, 0x00000000, 0x00000000) (sleeping...)
5975: fd=4 ev=POLLRDNORM rev=0
5975: fd=5 ev=POLLRDNORM rev=POLLIN
5975: fd=6 ev=POLLRDNORM rev=0x26B8

Some other information (lsof and pfiles):

root /root$ lsof -p 5975 -a -d ^txt
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
sshd 5975 wsadmin cwd VDIR 85,10 512 2 /
sshd 5975 wsadmin 0u VCHR 13,2 6815752
/devices/pseudo/mm@0:null
sshd 5975 wsadmin 1u VCHR 13,2 6815752
/devices/pseudo/mm@0:null
sshd 5975 wsadmin 2u VCHR 13,2 6815752
/devices/pseudo/mm@0:null
sshd 5975 wsadmin 3r DOOR 0t0 46
/var/run/name_service_door(door to nscd[401]) (FA:->0x30007644540)
sshd 5975 wsadmin 4u IPv6 0x301ce970740 0t152488 TCP
s1006270.servers.kbca.be:22->s1006450.servers.kbca.be:65181 (ESTABLISHED)
sshd 5975 wsadmin 5u FIFO 0x30b27abb6c0 0t0 22092419 (fifofs)
PIPE->0x30b27abb750
sshd 5975 wsadmin 6u FIFO 0x30010d16530 0t4 22092417 (fifofs)
PIPE->0x30010d164a0
sshd 5975 wsadmin 7u FIFO 0x30b27abb750 0t1 22092419 (fifofs)
PIPE->0x30b27abb6c0
sshd 5975 wsadmin 9u unix 105,35429 0t3043 55050244
/devices/pseudo/tl@0:ticots->(socketpair: 0x8a65) (0x302588686e0)

root /root$ /usr/proc/bin/pfiles 5975
5975: /usr/lib/ssh/sshd
  Current rlimit: 4096 file descriptors
   0: S_IFCHR mode:0666 dev:287,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR|O_LARGEFILE
      /devices/pseudo/mm@0:null
   1: S_IFCHR mode:0666 dev:287,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR|O_LARGEFILE
      /devices/pseudo/mm@0:null
   2: S_IFCHR mode:0666 dev:287,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR|O_LARGEFILE
      /devices/pseudo/mm@0:null
   3: S_IFDOOR mode:0444 dev:296,0 ino:46 uid:0 gid:0 size:0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[401]
      /var/run/name_service_door
   4: S_IFSOCK mode:0666 dev:293,0 ino:51830 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        SOCK_STREAM
        SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(49640)
        sockname: AF_INET6 ::ffff:10.152.34.25 port: 22
        peername: AF_INET6 ::ffff:10.152.34.26 port: 65181
   5: S_IFIFO mode:0000 dev:295,0 ino:22092419 uid:1500 gid:1500 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   6: S_IFIFO mode:0000 dev:295,0 ino:22092417 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   7: S_IFIFO mode:0000 dev:295,0 ino:22092419 uid:1500 gid:1500 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   9: S_IFSOCK mode:0666 dev:293,0 ino:17219 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        SOCK_STREAM
        SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX

When I kill -TERM the process and the parent SSH daemon (the one on the
system where it was launched - after the SIGTERM of the child it stayed in
CLOSE_WAIT) everything continues fine (the exit code even is 0).

This doesn't happen always, but is reproduceable (occurs every two days or
so).
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:40:56 EDT