[HPADM] SUMMARY: Large directories

From: David R Antoch (dantoch@csc.com)
Date: Fri Sep 16 2005 - 08:57:15 EDT


Original post:

>>>>>>
I have a filesystem (VxFS/LVM) that contains a directory with 600,000+
files (avg ~ 20K each... some are larger). Architecturally, an
applicaton no-no (but that's another issue...). As a side project,
we're evaluating a search tool that will search through the files, and
index them into a database. Now, management does not want to risk
evaluating the search on the production system, so I'm attempting to copy
the entire filesystem (51GB, they wanted to search it all) to a
development system. The target disks are new. Both machines are 11.0
patched to recent (within a few weeks) patch versions.

The copy (I've tried ssh and remsh|tar pipe, as well as NFS
find |cpio) gets a to a certain point, then starts thrashing the
target disk. iostat and sar say there's 1.7MB/sec continuous disk IO,
and I see 200+ seeks/second (seems way too much), and only about 1
file (20K) per 10 or 15 seconds gets copied. The issue is definitely
due to the directory size, as anything written into that dir, just grinds
to a crawl.

I used the VXFS defaults when building the filesystem. I was aware of
issues like decreasing the bytes per inode etc...for many small files,
but looking into the VXFS options, I really didnt come up with
anything other than the defaults. (an oversight?, what am I missing?)

Is there anything I can do for a VXFS filesystem, that would render
better performance, when writing large numbers of files into one directory?
>>>>>>

Thanks to everyone who replied:

Tom Myers
Marc Ahrendt
Bill Hassell
llicR
John Lanier
Shyam Hazari

Bill Hassel verified my worst fear :)) with his (abbreviated)
summary that hits it on the head:

"This is one of the well-documented 'features' of large directories.
It's an application design no-no for a reason--massivle large flat
filesystems are a very painful sysadmin issue. And as you've seen,
you have to work around the huge delays and difficulties in searching
through all these files.
...
...
 The VxFS filesystem is optimized out of the
box for large or small files so there isn't anything special to do
on the source system. On the destination, the file creation process
will start crawling as the directory gets really big. Think of it as
a parking lot at the SuperDome...it's easy to find a parking space
when it's empty but it takes a lot of driving around to locate a
spot when there are 20,000 cars parked there.
...
...
.."

I our case we decided to run the index on the production filesystem.
It's not a heavily used filesystem anyway, with the app writing 20K
to a couple of meg files fairly infrequently (one or two hundred per
day) ...but they collect there over time and there's no archival setup
yet. I'm trying to get them to store the files in subdirs named by
year/month YYYYMM, which would pretty much alleviate the problem, and can
support years and years of files. Definitely an application architecture
issue.

However, I'd be interested to find out if there any other filesystem
types that are more efficient and can work with large numbers better...

I'm also including some good info from all the respondents regarding
filesystems and copying :

-use "mkfs -m" to see how the production filesystems were built and
compare to the filesystems you built on the development system.
 Ex: /usr/sbin/mkfs -m /dev/vg01/lvol1

 - maximize random seeks on the development system volume group by
using extent-based striping across all available drives.
 Ex: /usr/sbin/lvcreate -s g -D y ...

- try rsync as a copy method. It may be more efficient about updating
the directory file as data files are copied into the target system.

- As far as the copy process, NFS is the least useful as it has a very
large network overhead and a puny 100Mbit LAN will severely limit the
speed of the transfer.

- find | cpio would be good if you could hook up the target disk to
the production system. The cpio -pudlm options will perform a direct
disk read to help bypass some of the filesystem overhead, but it will
not be optimal.

- use dd for a disk to disk copy. fsck on the target prior to
mounting:

keep in mind that dd
will copy each block (make sure you use a LARGE block size such as
bs=64k) regardless of whether the block contains multiple file and/or
directories that are changing. Ideally, this backup will take place
with application shutdown. The best choice is dd using a tape but
you could also do this in a network pipe (much slower than tape).

Once the copy has been restored (to the same-sized lvol on the target)
you'll need to run fsck on the rlvol (raw) volume. Then mount the
new filesystem and you should be ready to go.

Thx again,
Dave

--
             ---> Please post QUESTIONS and SUMMARIES only!! <---
        To subscribe/unsubscribe to this list, contact majordomo@dutchworks.nl
       Name: hpux-admin@dutchworks.nl     Owner: owner-hpux-admin@dutchworks.nl
 
 Archives:  ftp.dutchworks.nl:/pub/digests/hpux-admin       (FTP, browse only)
            http://www.dutchworks.nl/htbin/hpsysadmin   (Web, browse & search)


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 11:02:49 EDT