NFS Best Practices

From: Parkin Frank - fparki (fparki@acxiom.co.uk)
Date: Wed Jun 18 2003 - 09:13:20 EDT


We are frequently experiencing problems with NFS. Users file transfers
time-out and we see messages like the following:

Jun 18 12:22:34 server1 vmunix: NFS3 RFS3_WRITE failed for server server2:
RPC: Timed out
Jun 18 12:22:34 server1 vmunix: NFS3 write error 60 on host server2
Jun 18 13:45:49 server1 vmunix: NFS3 server server2 not responding still
trying

We have numerous Alpha servers (v5.1a PK4) each of which are acting as NFS
client and server with numerous mounts (majority use gigabit Ethernet).

The current nfsd and nfsiod settings are:

root 156305 1 0.0 12:38:33 ?? 0:00.01 /usr/sbin/nfsiod
24
root 156320 1 0.0 12:38:33 ?? 0:00.01 /usr/sbin/nfsd
-t32 -u32

The usual mounting options are:

server1:/dir1 /dir2 nfs rw,bg,soft,tcp

Has anyone experienced similar problems? We have applied a gigabit Ethernet
patch from HP but this hasn't improved the situation.

Are there any NFS best practice documents out there?

I have also stumbled across the following extract from an old thread (1995?)
on NFS problems (see below). Is this information still valid such that
multiple NFS mounts from different servers should be mounted in the
following way...

/nfs/server1/dir1
/nfs/server1/dir2
/nfs/server2/dir1 etc.

Many thanks
Frank

        The prevailing wisdom, summarised by Bernhard Schneck:

                  Thou Shalt Not Mounteth into the root directory.

        but more exactly:

          Rule 1. The parent directory of the nfs mount pt. shall not be
root.

          Rule 2. An nfs mount pt. shall not share its parent directory with
                  another nfs mount pt.
                  -passable exception: it the nfs mounts are from the same
                   nfs server, it probably will not be problematic.

        The most complete explanation and solution are from Paul David Fardy
below,
        but I also include selected others for relevent insite and comment:

        =====
        From: Paul David Fardy <pdf@xxxxxxxxxxxxxxxxx
<mailto:pdf@xxxxxxxxxxxxxxxxx <mailto:pdf@xxxxxxxxxxxxxxxxx
<mailto:pdf@xxxxxxxxxxxxxxxxx> >>

        This is a common problem with NFS. Many programs (the best example
        being /bin/pwd) search up the directory to find the full pathname
        for the current directory. On the way up, a program can encounter
        a remote filesystem root directory and hang if the host is
unavailable.
        The process often gets into an uninterruptable state.

        Take an example with 3 systems--Athos, Porthos, and Aramis--sharing
user
        files. Athos mounts the following file systems.

                /u1@xxxxxxx <mailto:/u1@xxxxxxx> <mailto:/u1@xxxxxxx
<mailto:/u1@xxxxxxx> > on /nfs/u1
                /u2@xxxxxxx <mailto:/u2@xxxxxxx> on /nfs/u2
                /u3@xxxxxx <mailto:/u3@xxxxxx> <mailto:/u3@xxxxxx
<mailto:/u3@xxxxxx> > on /nfs/u3

        When a user Bob runs "pwd" from /nfs/u1/bob, pwd does the following
search.

        A. i-node = stat(".") # I don't my name, but I do know
my number.
        B. chdir .. and scan for i-node # Directory must be in parent
somewhere
        C. found "bob" matching i-node # I now know my path ends in
"bob",
        D. i-node = stat(".") # but I still don't know my
parent's name.
        E. chdir .. and scan for i-node #
        F. found "u1" matching i-node # I now know my path ends in
"u1/bob".
        G. i-node = stat(".") #
        H. chdir .. and scan for i-node #
        I. found "nfs" matching i-node # I now know my path ends in
"nfs/u1/bob"
        G. i-node = stat(".") #
        H. chdir .., scan for i-node #
        I. found "." matching i-node # Must be the root directory,
we're done.

        When it works, pwd prints "/nfs/u1/bob". But NFS hanging can occur
if
        Aramis is unreachable. If we look back at those scans with full
        knowledge (as opposed to pwd's view), we know the following.

                Step A stats /nfs/u1/bob
                Step B scans /nfs/u1
                Step E scans /nfs
                Step H scans /

        The problem is that Step E scans /nfs and in the process could
attempt to
        access /nfs/u3 (the ordering of the /nfs directory is based on
timing).
        If Aramis is unreachable, then pwd will hang.

        The Solution: (that dreaded word) Segregation

        There are two models that clear this NFS problem. The one I
suggested
        earlier separates every mount point.

                /u1@xxxxxxx <mailto:/u1@xxxxxxx> <mailto:/u1@xxxxxxx
<mailto:/u1@xxxxxxx> > on /nfs/u1/nfs
                /u2@xxxxxxx <mailto:/u2@xxxxxxx> <mailto:/u2@xxxxxxx
<mailto:/u2@xxxxxxx> > on /nfs/u2/nfs
                /u3@xxxxxx <mailto:/u3@xxxxxx> <mailto:/u3@xxxxxx
<mailto:/u3@xxxxxx> > on /nfs/u3/nfs

        In this model, no two NFS mount points share a common parent.
Another
        model we've used in the past separates mount points based on the
        serving host.

                /u1@xxxxxxx <mailto:/u1@xxxxxxx> <mailto:/u1@xxxxxxx
<mailto:/u1@xxxxxxx> > on /nfs/porthos/u1
                /u2@xxxxxxx <mailto:/u2@xxxxxxx> <mailto:/u2@xxxxxxx
<mailto:/u2@xxxxxxx> > on /nfs/porthos/u2
                /u3@xxxxxx <mailto:/u3@xxxxxx> <mailto:/u3@xxxxxx
<mailto:/u3@xxxxxx> > on /nfs/aramis/u3

        In this model, u1 and u2 share a common parent, but they're both
served
        from the same host. If u2 were going to hang, then it's not likely
that
        u1 is reachable in the first place.

        In either scheme, a program like pwd run from any network filesystem
        would have to go up to /nfs then down a directory to encounter
        another remote file system. Very few programs do this; no program
        should.

        I prefer the first model because
                a) it generates shorter paths (df is less likely to wrap
                   and more likely to fit on a screen)
        and
                b) the second model is redundant (when you move a disk you
have
                   to change the mount directories and the symbolic links
that
                   hide the mount points along with the entry in
/etc/fstab).

**********************************************************************
The information contained in this communication is
confidential, is intended only for the use of the recipient
named above, and may be legally privileged.
If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication is strictly
prohibited.
If you have received this communication in error,
please re-send this communication to the sender and
delete the original message or any copy of it from your
computer system. Thank You.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:23 EDT