Re: too many files

From: Hunter, Mark (Mark.Hunter@ANHEUSER-BUSCH.COM)
Date: Wed Jul 30 2003 - 10:52:33 EDT


Yes, it does take a long time under jfs filesystems.
Why?
jfs based directories maintain a simple list for filenames - unordered.
Your backup command must dereference the file. Each dereference will search, on
average, 1/2 the directory before finding the file - 450,000. You also have to
open each file, most of which will not be in your incore inode table.
900,000 files * 450,000 dereferences = 405,000,000,000

On my 660 with 600MHz processors and 15GB RAM, 35275 files take about 42 cpu
seconds to do an ls -l on - after I have forced them into the incore table.
Double that time the first time I run the command.
timex ls -l > /dev/null

real 46.18
user 41.30
sys 2.59
Doing the math, you should take at least 650 times longer or about 8 hours to do
an ls -l. Double that if you have not forced everything into the incore inode
table/cache (which with that many files you can't). And at least another 8
hours to restore the files on the other side, ignoring the file sizes.

jfs and standard unix does not handle this many files in a directory well.
jfs2 would be a much better choice as the directories maintain the file list in
a sorted order. Thus, each dereference is log2(n), not n/2.

On jfs2, I created 35275 and did an ls -l. This was on an H50 with 332 MHz
processors, 1GB RAM.
timex ls -l > /dev/null

real 2.53
user 1.36
sys 1.15

Obviously, jfs2 is way faster at large number of files in the directory.

For your problem, backup by inode would be way faster than backup by name.
Nothing is going to improve your restore time on jfs, but if your target can be
made jfs2, you might be ok.

Mark Hunter

-----Original Message-----
From: Nguyen, Joseph [mailto:JNguyen@WM.COM]
Sent: Tuesday, July 29, 2003 5:35 PM
To: aix-l@Princeton.EDU
Subject: too many files

I have a filesystem that contains 900,000+ files in one directory. I ran
the following command to copy files to another host and it ran for couple
days and stops. Event just run the find command would take a long time.

find /indirectory -print | backup -iqf- | compress -c | rsh remotehost
"(uncompress -c | restore -xf- )"

Do you know any other command that can speed up the copy? we try to backup
to tape and restore and that also take days.

Joseph



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:17:04 EDT