Does AdvFS defragment ENO_MORE_MEMORY error cause metadata corruption?

From: Richard Jackson (rjackson@portal.gmu.edu)
Date: Tue Feb 11 2003 - 09:53:17 EST


Hello,

I have a problem with a v4 AdvFS filesystem after defragment failed due
to exhausting the default data area (128MB set via sysconfigtab
per-proc-data-size) limit on an Alphaserver 4100 running Tru64 UNIX 5.1A
patch kit #3.

About four weeks ago I received thousands of errors.
---------------------------------
dsk4c_domain needs defragment, aggregate I/O performance is 93.

Pass 1;
  Volume 1: area at block 57935744 (10272000 blocks): 99% full
  Domain data as of the start of this pass:
    Extents: 525930
    Files w/extents: 437787
    Avg exts per file w/exts: 1.20
    Aggregate I/O perf: 81%
    Free space fragments: 3147
                     <100K <1M <10M >10M
      Free space: 94% 6% 0% 0%
      Fragments: 3098 48 1 0

Pass 2;
  Volume 1: area at block 48117504 ( 9925120 blocks): 95% full
  Domain data as of the start of this pass:
    Extents: 518324
    Files w/extents: 440878
    Avg exts per file w/exts: 1.18
    Aggregate I/O perf: 75%
    Free space fragments: 5734
                     <100K <1M <10M >10M
      Free space: 40% 60% 0% 0%
      Fragments: 3427 2307 0 0

Pass 3;
  Volume 1: area at block 57610880 (10882176 blocks): 98% full
defragment: Can't allocate memory for expanded xtnt desc table
defragment: Can't create extent desc list
defragment: Error = ENO_MORE_MEMORY (-1054)
defragment: Can't allocate memory for expanded xtnt desc table
defragment: Can't create extent desc list
defragment: Error = ENO_MORE_MEMORY (-1054)
...
defragment: Can't create extent desc list
defragment: Error = ENO_MORE_MEMORY (-1054)
defragment: Can't get extent descriptors
defragment: Error = ENO_MORE_MEMORY (-1054)
...
defragment: Can't get extent descriptors
defragment: Error = ENO_MORE_MEMORY (-1054)
  Domain data as of the start of this pass:
    Extents: 476515
    Files w/extents: 442710
    Avg exts per file w/exts: 1.08
    Aggregate I/O perf: 75%
    Free space fragments: 2814
                     <100K <1M <10M >10M
      Free space: 5% 70% 3% 22%
      Fragments: 699 2096 10 9

Pass 4;
defragment: Can't allocate memory for analyzing storage bit map
defragment: Error occurred during pass 4 on volume 1. Continuing...
  Domain data as of the start of this pass:
    Extents: 0
...
---------------------------------

Since then the aggregate I/O performance has dropped from 93% to now
61%. I have run defragment on this filesystem for over 8 whole days,
on the weekends -- start Friday evening, stop Monday morning. I have
unmounted and re-mounted the filesystem without any noticable change.
I noted that 'defragment -nv dsk4c_domain' will run to completion but
it uses over 480MB of virtual memory (I used 'ps aux' to track VSZ and
RSS usage). I successfully ran 'verify -a dsk4c_domain'.

My concern is the AdvFS metadata is corrupt and I must backup the
filesystem, rebuild the AdvFS filesystem, then restore it. I have
contacted HP support and it appears to them that the filesystem
metadata is ok but it is not understood why defragment is not able to
defragment the filesystem. I am no longer able to get pass 'Pass 1' of
defragment on that filesystem. Before the ENO_MORE_MEMORY error, I was
able to defragment this particular filesystem within two days.

Has anyone else seen this problem? If the ENO_MORE_MEMORY defragment error
did cause the metadata to corrupt, then steps should be taken to protect
our filesystems. Either defragment needs to handle the situation better,
or sysadmins need to increase per-proc-data-size or make sure defragment
gets more data area space with 'ulimit -d', for example.

The filesystem in question has over 3 million files due to the way the
particular application, WebCT, manages itself.

-----------------------------------
showfdmn dsk4c_domain

               Id Date Created LogPgs Version Domain Name
3e0dd68b.000e078f Sat Dec 28 11:51:23 2002 1024 4 dsk4c_domain

  Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
   1L 72115200 9797936 86% on 256 256 /dev/disk/dsk4c
-----------------------------------
df -ki /webct

Filesystem 1024-blocks Used Available Capacity Iused Ifree %Ius
ed Mounted on
dsk4c_domain#webct 36057600 30044670 4898824 86% 3107541 17158007 15%
    /webct
-----------------------------------

-- 
Regards,						   /~\ The ASCII
Richard Jackson						   \ / Ribbon Campaign
Computer Systems Engineer,				    X  Against HTML
Information Technology Unit, Technology Systems Division   / \ Email! 
Enterprise Servers and Operations Department
George Mason University, Fairfax, Virginia


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:07 EDT