SUMMARY: ADVFS disk full, but not really full!

From: Andrew Raine (Andrew.Raine@mrc-dunn.cam.ac.uk)
Date: Fri Oct 03 2003 - 11:06:42 EDT


<original question below>

Well, as usual, I got quite a few replies over lunch, and all of them
suggested (amongst other things) that I might have got quotas turned on
on that fileset. I didn't think I had, but on closer inspection with
"showfsets" I saw that I did have hard and soft block quotas of 20000000.

Setting them to 0 with "chfsets" returned my system to sanity! I don't
know how they got turned on, but looking in /var/adm/messages, the
problem with this domain/fileset started immediately after the cluster
root domain filled up.

Thanks especially to Brian Staab, David Knight, Charles Ballowe, Bryan
Mills and, for a lot of suggestions for how to diagnose and fix a
corrupted partition, Tom Blinn. This list is truly amazing!

Regards,

Andrew

--
Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit, 
Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK
phone: +44 (0)1223 252830   fax: +44 (0)1223 252835
web: www.mrc-dunn.cam.ac.uk email: Andrew.Raine@mrc-dunn.cam.ac.uk
> Dear Tru64 Managers,
> 
> I wonder if any of you can shed any light on my current problem?
> 
> I have a 2-node cluster (DS20 + ES40 + HSG80, 5.1, PK3) which has, I
> think, got itsself confused about a ADVFS domain/fileset:
> 
> The volume, /scratch, appears to be full, and is causing problems when
> processes try to write to it:
> 
> alpha # df -k /scratch
> Filesystem             1024-blocks        Used   Available Capacity  Mounted on
> scratch_domain#scratch    20000000    20000000           0   100%    /scratch
> 
> However, when I look at the space actually used on it I get:
> 
> alpha # du -sk /scratch/* | sort -n
> 0       /scratch/vh
> 1       /scratch/NEO.log
> 8       /scratch/admin
> 8       /scratch/el
> 8       /scratch/root
> 8       /scratch/tm2
> 8       /scratch/tsh
> 16      /scratch/jrg
> 20      /scratch/tmp
> 33      /scratch/atpase
> 80      /scratch/quota.group
> 152     /scratch/quota.user
> 290     /scratch/ar
> 392210  /scratch/rk
> 6188864 /scratch/lf
> 8161222 /scratch/smb
> 8764514 /scratch/backup
> 11386965        /scratch/kunji
> 
> which adds up to 34894407*1024 bytes (~30 GB, which is more than the
> 20000000*1024, ~20GB, blocks in the df output isn't it?)
> 
> But, my memory is that the /scratch volume is much bigger than either
> 20 or 30 GB:
> 
> alpha # showfdmn scratch_domain
> 
>                Id              Date Created  LogPgs  Version  Domain Name
> 3b3c8f9e.010893e9  Fri Jun 29 15:24:30 2001     512        4  scratch_domain
> 
>   Vol   512-Blks        Free  % Used  Cmode  Rblks  Wblks  Vol Name 
>    1L  213291744   143058608     33%     on    256    256  /dev/disk/dsk12c
> 
> which looks like the volume is ~100 GB, and only 33% used (which fits
> with the 30GB used figure above)
> 
> Any idea what has happened?  How to fix it?  I've rebooted each of the
> cluster members in turn, but nothing changed.  I'm reluctant to take
> both nodes down simultaneously, as this is an NFS-server with several
> active connections.  However, I'd guess that a full reboot might be
> needed?


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:37 EDT