Summary: unexplained advfs I/O errors (fwd)

From: Steven Timm (timm@fnal.gov)
Date: Tue Jul 23 2002 - 14:35:36 EDT


Thanks to many who replied. Common query was--are the
errors also in binary.errlog. Yes they were.
Dr. Tom Blinn gave the answer that helps the most:

>This probably means the "genvmunix" kernel file in your root
>>directory is toast. Since that's a relatively important file
>and probably has not been changed from what's in the V4.0G
>initial kit, I'd recommend you mount the installation CDROM,
>move the existing file aside (say, to genvmunix.bak), then
>copy in the file from the CDROM media. If the files appear
>to be the same size, checksum them both (via cksum) and see
>if the checksums match. The odds are good that you can't
>get a clean checksum on the existing file on disk. Once you
>have the genvmunix replaced with a good file, remove the old
>(backup) copy; since the bad spot on disk has been tagged in
>the disk itself, AdvFS should not continue to use the space
>on the disk proper. But as long as the file is there, it's
>going to try to compress the file.

He was right-- even cksum on the file in question produced I/O
errors. Mv'ing the file to another name and copying a new one
in from the CD-ROM got us a good genvmunix again. Then
rm'ing the old file made it so that we could run the defragmenter
on this volume and get no I/O errors.

>If things are really bad with the disk (check the binary error
>log as well, you are probably getting disk errors logged there)
>you MIGHT want to take this opportunity to replace the disk.

Tom

Also, alan@nabeth adds that since writes to the / partition are
infrequent, you should not need to defragment it. But all said
that you should be able to defragment it, there is nothing
special about that partition that would prohibit this.

Steve Timm

> I am observing the following errors on a system which is running
> 4.0g. The following appears in /var/adm/messages
>
>
> Jul 22 01:01:22 fnsimu2 vmunix: AdvFS I/O error:
> Jul 22 01:01:22 fnsimu2 vmunix: Domain#Fileset: root_domain#root
> Jul 22 01:01:22 fnsimu2 vmunix: Mounted on: /
> Jul 22 01:01:22 fnsimu2 vmunix: Volume: /dev/rz8a
> Jul 22 01:01:22 fnsimu2 vmunix: Tag: 0x0000008e.8004
> Jul 22 01:01:22 fnsimu2 vmunix: Page: 1155
> Jul 22 01:01:22 fnsimu2 vmunix: Block: 135936
> Jul 22 01:01:22 fnsimu2 vmunix: Block count: 256
> Jul 22 01:01:22 fnsimu2 vmunix: Type of operation: Read
> Jul 22 01:01:22 fnsimu2 vmunix: Error: 5
> Jul 22 01:01:22 fnsimu2 vmunix: To obtain the name of the file on
> which
> Jul 22 01:01:22 fnsimu2 vmunix: the error occurred, type the command:
> Jul 22 01:01:22 fnsimu2 vmunix: /sbin/advfs/tag2name //.tags/142
>
>
> If I do the above command, it comes out that the file it is having
> the trouble with is genvmunix. This message is happening nightly
> at 01:01 as the defragcron utility runs. It has been going on for at
> least a couple of weeks, maybe longer. But otherwise the system
> appears to be fine and hasn't crashed at all.
>
>
> Error message in defragcron.log:
>
> defragment: Can't move file //genvmunix
> defragment: Error = I/O error
> defragment: Can't defragment domain 'root_domain'
> defragcron: could not defragment domain 'root_domain'
>
> My question--
> Does this mean that we should just reconfigure the defragger
> to not defragment the / domain? Am I trying to defrag
> something that shouldn't be defragged?
>
> Or is this a real hardware fault that needs to be serviced?
>
> Steve Timm
>
>
> ------------------------------------------------------------------
> Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/
> Fermilab Computing Division/Operating Systems Support
> Scientific Computing Support Group--Computing Farms Operations

Tom



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:46 EDT