Hanging processes

From: Matthias Reichling (reichling@rz.uni-wuerzburg.de)
Date: Tue Feb 11 2003 - 03:29:26 EST


We observe the following problem:

On a server under Tru64 UNIX V5.1A (Rev. 1885) with aggregate patch kit 2
(t64v51ab02as0002) installed, there are many proceses which can't be
killed (kill -9):

user001 142757 1 0.0 Nov 16 ?? 0:00.03 bin/coolmail -name fvwm2Coolmail -e bin/pine-manthey
user001 144370 1 0.0 Jan 08 ?? 0:00.03 bin/coolmail -name fvwm2Coolmail -e bin/pine-manthey
user001 149245 1 0.0 Jan 09 ?? 0:00.04 bin/coolmail -name fvwm2Coolmail -e bin/pine-manthey
user001 154268 1 0.0 Nov 17 ?? 0:00.04 bin/coolmail -name fvwm2Coolmail -e bin/pine-manthey
user001 157448 1 0.0 Jan 10 ?? 0:00.04 bin/coolmail -name fvwm2Coolmail -e bin/pine-manthey
...
(about 85 processes)

All processes have PPID 1. coolmail is a mail notification utility compiled
by the user and running without any privileges.

On a second server with the same OS version installed, we have a similar
problem with an other user and an other program:

user002 13789 1 0.0 Dec 06 ?? 20:00:18 /usr/local/lib/g98a7/g98/l1002.exe 0 Dieazi.chk 1 /tmp/Gau-13789.int 0 /tmp/Gau-13789.rwf 0 /tmp/Gau-13789.d2e 0 /tmp/Gau-13789.scr 0 /tmp/Gau-13265.inp 0 junk.out 0
user002 67391 1 0.0 Dec 13 ?? 20:52:16 /usr/local/lib/g98a7/g98/l1002.exe 0 Dieazi.chk 1 /tmp/Gau-67391.int 0 /tmp/Gau-67391.rwf 0 /tmp/Gau-67391.d2e 0 /tmp/Gau-67391.scr 0 /tmp/Gau-67394.inp 0 junk.out 0
user002 94809 1 0.0 Dec 16 ?? 18:38:05 /usr/local/lib/g98a7/g98/l1002.exe 26214400 Dieazi.chk 1 /tmp/Gau-94809.int 0 /tmp/Gau-94809.rwf 0 /tmp/Gau-94809.d2e 0 /tmp/Gau-94809.scr 0 /tmp/Gau-94839.inp 0 junk.out 0

The TIME doesn't increase. A similar job of the same user runs (at least
until now) without problems. The software involved is Gaussian, running
again without any privileges.

There are some jobs with many days of CPU time running on these machines,
so we want to avoid any unnecessary reboots.

How can we kill these jobs without rebooting the machine?
And how can we avoid such hanging jobs in the future?

Regards,

Matthias Reichling
Computing center
University of Wuerzburg
reichling@rz.uni-wuerzburg.de



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:07 EDT