ADVFS I/O errors

From: Mark Schubert (mark.schubert@cogita.com)
Date: Tue Jun 18 2002 - 20:14:04 EDT


Hi all,

We had an issue on a Alpha ES40 cluster running V5.1 (patch kit 3) connected
to an EMC SAN where ADVFS errors, similar to the following, were reported.
The errors led to the corruption of the affected files, which included a
database file.
Jun 17 08:24:14 pha303 vmunix: AdvFS I/O error:
Jun 17 08:24:14 pha303 vmunix: Domain#Fileset: dom4#fset0
Jun 17 08:24:14 pha303 vmunix: Mounted on: /qaddb
Jun 17 08:24:14 pha303 vmunix: Volume: /dev/disk/dsk8c
Jun 17 08:24:14 pha303 vmunix: Tag: 0x000005c9.8002
Jun 17 08:24:14 pha303 vmunix: Page: 113027
Jun 17 08:24:14 pha303 vmunix: Block: 22535232
Jun 17 08:24:14 pha303 vmunix: Block count: 16
Jun 17 08:24:14 pha303 vmunix: Type of operation: Write
Jun 17 08:24:14 pha303 vmunix: Error: 12
Jun 17 08:24:14 pha303 vmunix: EEI: 0x0
Jun 17 08:24:14 pha303 vmunix: To obtain the name of the file on which
Jun 17 08:24:14 pha303 vmunix: the error occurred, type the command:
Jun 17 08:24:14 pha303 vmunix: /sbin/advfs/tag2name /qaddb/.tags/1481

I am particularly interested in the meaning of Error number 12. Does anyone
know what this represents?

No errors were reported in binary.errlog at the time.

NOTE: Defrag was running at the time (don't blame me I didn't set it up) and
reported the following error: Mon Jun 17 00:33:22 EST 2002: Beginning
defragmentation of dom4.
defragment: Can't move file /qaddb/backup/au/crp/mfgcrpau.bkp
defragment: Error = Not enough space
defragment: Error occurred during pass 7 on volume 1. Continuing...
Mon Jun 17 08:24:59 EST 2002: Finished defragmentation of dom4. I can assure
you that the file domain DID NOT run out of disk space.

Does anyone have any ideas? I am suspicious that defrag has highlighted a
bug in ADVFS. Are there still bugs out there in ADVFS?

Also, does anyone have experience with Unix recording errors from an EMC
SAN? Compaq have said that because EMC are a third party that hardware
errors are not logged in binary.errlog for such devices. I think it is
essential that if Unix has a problem with some hardware that errors are
logged. Is there some other utility that can do this on Tru64?

Thanks,
Mark.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:44 EDT