RE: many process access one file problem. . .

From: Sharpe, John (John.Sharpe@vuinteractive.com)
Date: Fri Oct 11 2002 - 16:30:34 EDT


Thanks to James M., Paul C., Frank S., Tristan B., and Luc S. for the
answers. We set noatime on the fs, previously had 'set rlim_fd_cur=4096' in
/etc/system, and tried to have mirrors of the file (not much luck getting
the user to choose something other than the default). It seems it is a bug
and/or kernel parameter and we are talking to Sun to find out about a
mentioned patch, etc.

-------
You're hitting a maximum number of open file handles per file. I'm not sure
if this is a tunable parameter or not. We ran into it a few years ago on our
print server -- we had all the printers using /dev/null and once the server
got to a certain print load, some prints would just stop. We fixed it by
using mknod to create more null devices, one for each printer actually.
Hasn't been a problem since.

You're solution will probably be similar if it can't be tuned by a
parameter. You'll need to replicate the file. I don't think symbolically
linking will work and I'm sure hard linking won't work. You need to open
different inodes.
-------
If you can live without file access times, mount the filesystem with
the noatime option. Updating the access time is the biggest
bottleneck on simultaneous file read accesses.
  If the filesystem is NFS mounted, you might also look into upping
the number of NFS client processes.
-------
...pay very close attention to your mutex contention rates (mpstat will
show you)

If the process is using fcntl locks on the file, there is a known
scalability
bug in solaris 8 which will knock you out pretty well above a couple of
hundred
contcurrent lockers (theres a t-patch available, official patch expected in
january).
-------
(IN /etc/system)
*
* File descriptors
*
* kernel default is 256, set to 1024 is
* the max. advisable value so the system and
* applications are stable.
set rlim_fd_cur=1024
-------

-----Original Message-----
From: Sharpe, John
Sent: Thursday, October 10, 2002 4:16 PM
To: sunmanagers@sunmanagers.org
Subject: many process access one file problem. . .

We are experiencing problems with hundreds of processes trying to read the
same file.
The processes trying to read the file are both apache and/or ncftpd daemons.

>From our analysis, it seems the problem lies within the filesystem or a
kernel parameter/limitation.

Once the number of processes trying to read the file goes over a certain
threshold, all new processes that try to open that file hang. Also, when it
is in this state, if you try to 'cp' the file from the command line, your
'cp' will hang. If you try to access other files on the file system, with
any of the mentioned processes, it works normally.

The machine is not running a load average of between 1 and 2. 'sar' reports
low iowait an d plenty of idle cpu time. Machine specs are:

E450 w/4x400mhz cpus, 4gb of memory, Solaris 8 with Veritas fs on the
mention fs.

We have found no information with sun, etc. I will summarize for the group.

John Sharpe
Unix Systems
Vivendi Universal Games
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:25:06 EDT