SUMMARY:dt related errorlog fills up FS

From: dominic christopher (dominicfree@yahoo.com)
Date: Sat Jun 07 2003 - 02:12:46 EDT


Sincere thanks to Dr Tom Blinn and Andy Cohen for
their help with
 this issue .

 The Problem:

   All of a sudden the /usr file system on a ES40 with
tru64 4.0F
 and patchkit 7 , reached 101% and kept rising ,the
server was crawling
 and apps slowed down .

 An initial investigation revealed that a dt (desktop)
process gone
 haywire was logging errors to its errorlog file in
the .dt subirectry.

 Nulling this file did not help , and only on killing
the CDE desktop
 did the logging stop.

 I reproduce Dr Blinns suggestions as below :

 The "dt" subsystem is used extensively in CDE. The
fact that
 the log was in a single user's directory (at least,
that's what
 I *think* you wrote) says that something in that one
user's CDE
context was going wrong. "select" is a standard C
library call,
 and is used in "polling" for I/O operations so that a
program
 can manage multiple I/O streams without blocking,
e.g., deal
 with multiple network links. Error code 22 is
 #define EINVAL 22 /* Invalid
argument */
 which suggests that some part of the "dt" ("dt"
stands for, if I
 remember correctly, "desktop", as in "Common Desktop
Environment")
subsystem got into trouble, and wound up in an error
loop where it
    kept trying to call "select" with a bad argument
and failed to see
      that it was getting an error code from which it
never recovered.
   This is an ugly bug, but it's a bug, and if you
have a support
  contract, you should report it.

 There were LOTS of processes running on the system,
you just did
 not know about most of them. If there are users
logged in, they
 can have MANY processes, most of which are sitting
idle most of
 the time, and with CDE, the "dt" subsystem is always
there.

    There is probably no way to disable the logging.
It's there so
    that you can find and fix such errors. Of course,
when it goes
    wrong, it can be a problem in itself. If you
managed to remove
    the log file and replace it with a symlink to
/dev/null or with
a directory instead of a simple file (as just two
examples), it
    is likely that whatever CDE component is trying to
append error
    messages to the log file would either append to
/dev/null (with
    the symlink) or fail completely (with the
directory). But this
   is not really a good idea. Another approach is to
move the user
   directories off of /usr to a file system that's
less likely to
   cause operational problems if it fills up
completely. That's a
       good idea in general.

                   Tom

   Andy Cohen said :

   you might be able to user 'fuser' to determine what
    process is writing to that file .

  Other associates suggested , this situation can
arise when the system
  date/time is changed in multiuser mode , on a CDE
system .
 The sysadmin informs me that he did indeed change the
time on this
 system , and also on 2 more DS20E's which had CDE
running , so it could
 be a random bug .

 sincere regards & thanks

  Dominic

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:21 EDT