SUMMARY: too many files

From: Joe Evans (joe.evans@kcl.ac.uk)
Date: Tue May 09 2006 - 11:14:33 EDT


This one is difficult to summarise so I have put all the responses below
for you to look at.

Thanks to everyone that responded.

Joe Evans

*****************************************************************

Dear Managers,

Does anyone know if you can expect performance degradation on a system
if there are folders with a large number of files in them?

In this case I am talking about 32,000 files.

Tru64 5.1b

Thanks,
**********************************************
Ufs or advfs filessytem??

Advfs should be ok .. Ufs, not sure ..
K

******************************************

Depends on the files and how they've been created. If you've got the
frag file enabled in the domain and your files are relatively small in
nature (or if they're larger but they *all* aren't on even 8Kb size
boundaries, the frag file may cause you to see a performance degradation
when you're accessing (for read/write operations) the files.

"Fastest" way to fix this is to "backup" the data, create a new fileset
with the frag file disabled (see the man page for the mkfset/chfset
commands, believe it's "-o nofrag") and restore into this fileset.

Also, could be you're running into file fragmentation issues (i.e. the
files in the fileset as well as the frag file itself is fragmented and
needs to be defragmented) which would compound the performance issues.
"addvol/rmvol operation would most likely address this problem in the
most expedient manner (and with the least impact to your production
environment).

Hth,

// Thomas

**********************************************************************

Hi,

I'm pretty sure that On our old 5.1a machines we routinely had folders
with over 100,000 files in them. I don't think we had any significant
problems except 'ls' kinda hung up, but it'd come back after a while. :)

Thanks,
Rich.

Rich Fox
**************************************************************************

As is often the case, "It depends." If you're doing things like file
name completion or file "globbing" (which require reading the whole
directory), that will take a lot more time than it would in a small
directory.

If you're looking up a particular file, then AdvFS's directory structure
makes that much quicker than UFS's linear scan, though once looked up,
the file will be in the kernel's name cache and lookups will be faster
while it remains cached.

If the directories are mounted on other clients via NFS, you can bog
down your whole network reading big directories so the impact can
extend beyond the system with the big directories.

    -Ric Werme

*******************************************************************************

I advise users not to put more than 1000 files in a single directory,
although they rarely listen to me. :-)

Once the number of files in a directory gets large, the main problem is
that metadata operations on the directory entry itself can become
slower, especially if the directory is served out over NFS, and multiple
clients are trying to access it simultaneously.

If the entire filesystem has very large numbers of files, you will find
that backups are extremely slow, on the basis of the sheer number of
inodes which need be backed up, especially if the files are small.

If the files are very small (less than 8k) then performance will vary
enormously dependent on whether the AdvFS fileset has the frag file
turned on. If the frag file is on, performance will be slow for small
files, but disk-efficient. If the frag file is off, performance will be
faster, but it'll waste a lot of disk space, so you pays yer money and
takes yer choice.

The main disadvantage to the user of course, is that filename globbing
stops working and you get things like "arg list too long" messages.

Tim

********************************************************************************

Hi Joe,

I've found that users with huge numbers of files in their accounts do
experience a degradation when doing things like listing files and editing
files. I think it is simply because they have so many files and the
directory folders hit some kind of size limitation and become difficult to
update or process quickly.

Chris

***********************************************************************************

Joe Evans,
Technical Services Manager,
Management Information Systems,
King's College London,
57 Waterloo Road,
London SE1 8WA
Telephone:020 7848 3774



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:30 EDT