From: Andrew Raine (Andrew.Raine@mrc-dunn.cam.ac.uk)
Date: Fri Oct 03 2003 - 11:06:42 EDT
<original question below>
Well, as usual, I got quite a few replies over lunch, and all of them
suggested (amongst other things) that I might have got quotas turned on
on that fileset. I didn't think I had, but on closer inspection with
"showfsets" I saw that I did have hard and soft block quotas of 20000000.
Setting them to 0 with "chfsets" returned my system to sanity! I don't
know how they got turned on, but looking in /var/adm/messages, the
problem with this domain/fileset started immediately after the cluster
root domain filled up.
Thanks especially to Brian Staab, David Knight, Charles Ballowe, Bryan
Mills and, for a lot of suggestions for how to diagnose and fix a
corrupted partition, Tom Blinn. This list is truly amazing!
Regards,
Andrew
-- Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit, Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK phone: +44 (0)1223 252830 fax: +44 (0)1223 252835 web: www.mrc-dunn.cam.ac.uk email: Andrew.Raine@mrc-dunn.cam.ac.uk > Dear Tru64 Managers, > > I wonder if any of you can shed any light on my current problem? > > I have a 2-node cluster (DS20 + ES40 + HSG80, 5.1, PK3) which has, I > think, got itsself confused about a ADVFS domain/fileset: > > The volume, /scratch, appears to be full, and is causing problems when > processes try to write to it: > > alpha # df -k /scratch > Filesystem 1024-blocks Used Available Capacity Mounted on > scratch_domain#scratch 20000000 20000000 0 100% /scratch > > However, when I look at the space actually used on it I get: > > alpha # du -sk /scratch/* | sort -n > 0 /scratch/vh > 1 /scratch/NEO.log > 8 /scratch/admin > 8 /scratch/el > 8 /scratch/root > 8 /scratch/tm2 > 8 /scratch/tsh > 16 /scratch/jrg > 20 /scratch/tmp > 33 /scratch/atpase > 80 /scratch/quota.group > 152 /scratch/quota.user > 290 /scratch/ar > 392210 /scratch/rk > 6188864 /scratch/lf > 8161222 /scratch/smb > 8764514 /scratch/backup > 11386965 /scratch/kunji > > which adds up to 34894407*1024 bytes (~30 GB, which is more than the > 20000000*1024, ~20GB, blocks in the df output isn't it?) > > But, my memory is that the /scratch volume is much bigger than either > 20 or 30 GB: > > alpha # showfdmn scratch_domain > > Id Date Created LogPgs Version Domain Name > 3b3c8f9e.010893e9 Fri Jun 29 15:24:30 2001 512 4 scratch_domain > > Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name > 1L 213291744 143058608 33% on 256 256 /dev/disk/dsk12c > > which looks like the volume is ~100 GB, and only 33% used (which fits > with the 30GB used figure above) > > Any idea what has happened? How to fix it? I've rebooted each of the > cluster members in turn, but nothing changed. I'm reluctant to take > both nodes down simultaneously, as this is an NFS-server with several > active connections. However, I'd guess that a full reboot might be > needed?
This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:37 EDT