SUMMARY: Out of Memory Message in Huge amount of Files

From: Raul Sossa S. (RSossa@datadec.co.cr)
Date: Tue Dec 23 2003 - 12:46:15 EST


n 09-Dec-03, Raul Sossa S. wrote:
> Hello Guys!
> We have about 3 millions of files in one Tru64UNIX directory.
> The binary files are people pictures that in average contains 80K of
general
> size.
> When we do a "ls -al " we're getting an "Out of memory" error message.
> Does anyone knows about any kernel parameter or swap or any tuning issue
> that might help to evoid this message and get the real output from the
> Operating System Shell. What is the Tru64UNIX max files that can we have
> in a directory ?

Answers:
I applied all of this suggestions, thank you very much :
__________________________________________________________________________
From: Alan Rollow - Dr. File System's Home for Wayward Inodes.
[mailto:alan@desdra.cxo.cpqcorp.net]

ls(1) is constrained by the same per-process virtual memory
        limits as any other program. The long ls(1) listing needs
        enough memory for the stat(2) data for each file, so that
        it can get the data and then sort it. For 3 million files,
        this is going to be a lot of data and could easily run the
        process into the datasize virtual memory limit.

        Depending on your shell, there may be a built-in command
        that will let you raise the per-process limits from the
        default to the maximum or something inbetween. For most
        shells this is either "limit" or "ulimit". You're shell's
        manual page may document the command.

        The per-process limits can be configured with sysconfig(8).
        The sys_attrs_proc(5) manual page documents the parameters.
        The limits come in two values; default and maximum. An
        unmodified system typically has limits of:

        Process Data Space:

                Maximum: 1 GB
                Default: 128 MB

        Process Stack Space:

                Maximum: 32 MB
                Default: 2 MB

        There are also limits for total address space and on some
        versions total system virtual memory. The amount of page
        and swap space you have can also limit virtual memory use,
        so check that as well.

        On my V4.0G system, the structure used by stat(2) is 80
        bytes in length. 3,000,000 of them is going to take at
        least 228 MB of virtual memory. Clearly, that's more than
        the typical default process data size. If reorganizing the
        data isn't a good option, then you might want to consider
        raising the default process data space size to be large
        enough to run this particular command. The sysconfig(8)
        manual page for information about changing the parameters.

        The system tuning guide may also have some advice. If you
        don't have a paper copy, PDF and HTML versions are on the
        documentation CDROM.

        As for limits on the number of files in a directory, I
        don't recall if there are any enforced limits. There
        are limits which come just from the amount of space to
        store all the metadata for a file; the directory size
        itself can't larger than the file system can hold as a
        file. If there is a limit, it is on the order of 2^31-1
        or 2^32. Since Tru64 UNIX uses 64 bit integers for many
        things if the 32 bit integer limit is relevant to the
        number of files in a directory, the 64 bit integer limit
        is too high to worry about.

        Depending on the file system and version, there may be
        practical limits on the number of files in a directory.
        UFS is well known not handling large numbers of files
        gracefully. Your 3,000,000 is well beyond the point
        where the UFS limitations become an issue. If you're
        using UFS for this file system, fixing the memory problem
        is only the start.

        Older versions of AdvFS didn't handle large number of
        files well, but it did bettery than UFS. I believe
        there were metadata changes in V5 that significantly
        help the performance of processing directory with lots
        of files. So, a V5 created AdvFS may not much trouble
        with so many files.
__________________________________________________________________________
From: Tim Cutts [mailto:tjrc@sanger.ac.uk]

UNIX directory performance is dreadful if there are man entries in a
directory. The practical limit, for performance, in my experience is about
10,000 files, so you have already massively exceeded this! You'll find that
lots of other things may be broken with directories like this
- I wouldn't guarantee that dumps will work properly, for example.

You should reorganise the data into subdirectories, each of which has around
1000 files. I have some perl code to create such a directory structure, if
you are interested.

Tim

__________________________________________________________________________
From: Dr Thomas.Blinn@HP.com [mailto:tpb@doctor.zk3.dec.com]
There is no practical upper limit to the number of files you can put in a
single directory, other than the number of files you can put in a single
file system.

The message you are seeing is coming directly from the "ls" utility. As "ls"
reads in the directory information, it has to sort it, since it's not in any
particular sorted order in the directory itself. So "ls" has to allocate
memory to hold the name strings (as well as the node numbers and some other
data) and then once it's got all of the names in memory, it sorts the
listing in memory and then displays it.

On most systems, the default memory size limits are low enough that in a
directory with HUGE numbers of file, "ls" simply runs out of memory for
building the listing.

There are a couple of ways to work around this. One is to use the "find"
utility with the "-ls" option and then use an external sort with the sort
utility and then use "awk" to format the sorted data if you want something
more sophisticated than what the "find -ls" is giving you.

Another is to relax the memory limits. You can do this for every process on
the system by default (which is risky if you don't have the memory and swap
resources to support memory hogs), or you can do it for the users who need
to use "ls". In any case, what you need to look at is things like the
maximum address space and the maximum data size (that is, heap size)
parameters. You can learn more from the system tuning guide or the
reference pages for the various system attributes. You do the tuning in
/etc/sysconfigtab.

If you have the max values already large enough, you can use the "limit"
command (which varies by shell) to adjust the limits for the particular
users. On my V4.0G system, I routinely bump up the limits for address space
and data space in shell scripts where I need to use "ls" in directories with
large numbers of files, and since it's wrapped in a shell script, I don't
otherwise change the limits for my normal processing.

Hope this helps you understand your options.

Tom

__________________________________________________________________________
From: tru64-unix-managers-owner@ornl.gov
[mailto:tru64-unix-managers-owner@ornl.gov]

You can change proc
Max_per_proc_data_size

Nasır YILMAZ



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:47 EDT