Summary: ADVFS Dilema

From: Brewer, Edward (BREWERE@OD.NIH.GOV)
Date: Wed Nov 13 2002 - 15:32:05 EST


Admins,

I got a couple of responses that echoed a common theme, that something
deleted an open file and the discrepency occured in df and du from this.
Our problem was that we couldn't find an open file in lsof or fuser for
anything growing in /usr/local filesystem. This was troubling.

Here are the responses that I got.

From: Doulgas Brown

We have seen this numerous times on our boxes. Always a sqlplus process
that doesn't die or is contolled by oracle. Generally use oracle to stop
the process or shutdown oracle.
The df - du commands report different amounts because of the still running
process.

From: Bruce Sean

Email responses

FWIW, is it possible that the process never updated EOF because it was
in some sort of hard disk write loop? You wouldn't see anything in the
file with tail because it would use the EOF on disk.

I actually have no clue, just the question.

Bruce,

I am confused by your question. Are you wondering if the process just
failed to close the file after an EOF and was just sending everthing to
that
file??

Lee Brewer

Bruce.
Let's see, I was thinking that the process, while it was running, was
writing madly to disk, BUT not updating EOF, wherever EOF is stored.
Meanwhile, AdvFS was looking at free space table somewhere else. You
kill the process and the file is gone along with all the "temporary"
sectors it has taken up.

Again, I have no real idea about this, just the thought.

>From Pat O'Brien

Yes,
the errant sqlplus process creates a dot.somename file in the directory that
the user was in when launching said process without executing a proper nohup
out command. As to why the tools don't work, we have not figured that out
yet. Our tools broke after upgrading to 5.1 from 4.0f, though we also
upgraded the tool several time to no avail. the tools do work for most
rudimentary things, but during time of crisis like those you describe they
don't do far.

From: Daniel Norris

I've seen this before on other platforms (for this particular problem, I
don't think platform is an issue). Basically, the inode is not actually
changing the free space recorded in the filesystem's superblock until it
is actually released. In your case (and the other cases I've had to
debug), a user process still had an fopen on some file, but the file was
deleted. So, "du" didn't show the space being used, but "df" showed
that you had tons of space in use. When the process is killed or exits,
the fopen is cleaned up (probably in the kernel somewhere) and the
superblock is updated to show the appropriate amount of free space.

I'm not a filesystem expert, so I doubt that my explanation is
absolutely correct, but it is how I understand the situation. I've seen
similar behavior on Solaris (ufs and vxfs), HPUX (vxfs and hfs) and
Linux (ext2fs). I think that this is just the "way it works" and
there's nothing much you did or can do to keep it from occurring again.

Good luck!



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:59 EDT