can massive file erasure damage a disk ?

From: Lucio Chiappetti (lucio@mi.iasf.cnr.it)
Date: Tue May 06 2003 - 08:51:29 EDT


On my Alpha 255 (let's call it machine1) I have a BA353 enclosure with 3
RZ28M disks (let us call them /a /b /c). I know they are quite old.

A while ago I needed to make room on disk /b. So I did a recursive rmdir
of a directory which was taken a fairly LARGE PERCENTAGE of the disk.
Since this was taking a long time, I was also writing something to disk /b
at the same time. During this procedure I got a number of SCSI errors.
I ended having maintenance REPLACING disk /b (I was able to recover most
of the stuff after a sequence of painful fsck, but the diagnosis was that
the disk itself was broken).

Now I'm having a DIFFERENT problem on disk /c. The curious thing is that
also on this disk I did a MASSIVE ERASURE (a couple of recursive rmdir
which got rid of 90% of what was on the disk).

After that I put new data on disk /c (not much, some 30%), and started
running a program which INTENSIVELY accesses such data VIA NFS (I have to
do it this way because the program runs under a different OS version). So
the program runs on machine2.

The program crashed yesterday evening on machine2 after erratic "NFS
server machine1 not responding". This caused also other problems (see
below).

I restarted the program on machine2 this morning, and again I got some
"NFS server machine1 not responding". In particular the program crashed
(actually this is not a crash, is a regular error termination with an
error message) because some files "/c/somewhere/some.file" was not found.

The fact is that the file WAS THERE before the program started. When I did
an ls -l of /c/somewhere (and I did this from machine1 where disk /c is
LOCAL) I got a strange result. It listed all the file NAMES but for some
of them it said "not found" !

I saw SCSI errors, "NFS3 write error 5", an "I/O error" attempting to
umount /c, and there are messages like "Defering I/O (errno 5) for
block(0x18e00, 0x18e 00) on device 8,3074" in syslog.

I then did a shutdown of machine1. It told me that filesystem /c was being
marked dirty. I then booted single user, and did an fsck on /c. Curiously
fsck GAVE NO ERRORS. I then did a bcheckrc and rebooted multi user, and
the content of /c/somewhere is ALL THERE.

Now I am not sure whether THE PROBLEM IS WITH A HARDWARE ERROR WITH A BAD
"/c" DISK OR AN EXCESSIVE NFS ACCESS.

COULD THE MASSIVE ERASURES HAVE SPOILED THE DISK ? Should I act
differently when I want to make massive erasures (like moving data
elsewhere and remaking the file system ?)

I suppose I will test putting the data on disk /b (which is new).

PS

I mentioned side problems. Both machine2 and another machine3 run some
crontab jobs (one of them every 5 min) whose executable is on disk /a of
machine1 (i.e. VIA NFS). The crontab job for the rest works LOCALLY on
each machine, and send a mail to machine1 ONLY in case of anomalies.

This morning I found that : a) the mail queue of machine2 and machine3
was cluttered of messages generated by those cron jobs ; b) the process
table of machine1 was full (procmail wasn't keeping up with the rate of
messages); c) on machine2 and machine3 there were SEVERAL INSTANCES of
cron running !

I cleaned up all that (it did happen sometimes in the past, but for a
different reason, our NIS master not responding). Apparently the
unavailability of disk /c from machine1 stopped the entire NFS server also
for disk /a.

And curiously enough, some of the mail messages in the queue did not have
a Subject line as generated by the cron job, but a header line starting
with Cron: and saying something like "cannot set environment, possible
security problem". I NEVER SAW SUCH A MESSAGE BEFORE

BTW machine2 (and 3) are Tru64 V4.0, while machine1 is still DU 3.2.

----------------------------------------------------------------------------
Lucio Chiappetti - IASF/CNR - via Bassini 15 - I-20133 Milano (Italy)
----------------------------------------------------------------------------
L'Italia ripudia la guerra [...] come Italy repudiates war {...] as a
mezzo di risoluzione delle controversie way of resolution of international
internazionali controversies
                [Art. 11 Constitution of the Italian Republic]
----------------------------------------------------------------------------
For more info : http://www.mi.iasf.cnr.it/~lucio/personal.html
----------------------------------------------------------------------------



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:17 EDT