Dump(1M) Performance over Networks

CONTENTS

INTRODUCTION

It was taking over 17 hours to backup a 3.2GB filesystem to a tapedrive on a remote host, using dump(1M).This led me to conduct a systematic study of dump(1M) performance over networks, using various parameters.

Acting on the results shown here, I modified my backup script to use a blocksize of 80KB, rather than the default blocksize of 10KB. The time required dropped to under 4 hours -- a speedup factor of almost 5X. This was less than the 8X improvement predicted by my controlled tests, but still significant. I compare the test results with our actual experience in the real life section.

Although these tests were performed using dump(1M), the results probably translate to bru(1), tar(1) or other backup programs.

DESCRIPTION OF TESTS

Equipment

The tests were run on glutamine, an SGI Indigo2 Extreme with 200 MHz R4400. The filesystem was located on a Seagate ST15230N 4GB Hawk disk drive (5400 RPM, 3.5"). We also used methionine as a remote host. Methionine is an SGI Challenge-M with 150MHz R4400. Both hosts are on the same ethernet segment; both are attached to the same FDDI hub; both run IRIX 5.3.

The tape drive used is an Exabyte 8505 8mm from Transitional Technology. It was the only device on the external SCSI bus of glutamine. The tape drive was used in high density mode with data compression on.

Method

A simple shell script was executed on glutamine from the root account (because dump(1m) requires root). There was no other activity on glutamine during the tests; activity on methionine and on the ethernet and FDDI networks is presumed to have been light. This assumption is borne out by the results.

Initially, a directory in a filesystem on glutamine was filled with 27.8MB of mixed files, as shown by the df(1) command. Most of the space was occupied by a few large files. These files are compressable by 30% to 50% using compress(1) -- we therefore assume that a similar compression could be obtained by the tape drive.

First, the entire filesystem was dumped to /dev/null. This has the effect of loading the filesystem cache with as much data as it will hold. Otherwise, this might be done on the first test pass, artificially lengthening its run time in comparison with subsequent passes. A standard command was repeated in the script:
timex dump 0bCf $blocksize $tape /filesystem

Blocksize ranged from 10 to 128 (in KB); tape specified the following output devices:

  • /dev/null -- The null device on the local hosts.
  • /dev/rmt/tps1d6nsv -- Tape drive on the local host, by way of the variable block driver.
  • /dev/rmt/tpd1d6ns -- Tape drive on the local host, by way of the fixed block driver.
  • methionine-e:/dev/null -- Null device on remote host via ethernet.
  • methionine-f:/dev/null -- Null device on remote host via FDDI.
  • methionine-e:/dev/rmt/tps1d6nsv -- Remote tape drive via ethernet.
  • methionine-f:/dev/rmt/tps1d6nsv -- Remote tape drive via FDDI.
  • Following these tests, the contents of the directory were copied to each of three other directories, increasing the contents of the filesystem to 111.2MB. The same series of tests were repeated.

    The log file from this script provided the real run time (via the timex(1) command) for each of these tests. Comparing the results from the different filesystem sizes, we can calculate the differential data rate (size increase divided by time increase). We can then calculate the fixed overhead involved in setup and cleanup, by taking the total time for the larger filesystem, minus the size of the filesystem multiplied by the differential data rate.

    RESULTS

    Overhead

    Table 1 shows the calculated startup time for different devices and blocksizes. In all cases, this includes the time dump(1M) spends scanning the filesystem (or that portion that is independent of filesystem size). If output is to a tape device, it includes tape positioning, startup and rewinding time.

    TABLE 1 -- Overhead (sec)
    Blocksize
    (KB)
    local
    null
    local
    tpnsv
    local
    tpns
    ether
    null
    FDDI
    null
    ether
    tpnsv
    FDDI
    tpnsv
    1028111102829342104
    202794101293010398
    32289897333013998
    64289897252910198
    80289898352810798
    128289797272910598

    Differential Data Rates

    Table 2 shows the differential data rate for various output devices, for a range of blocksizes. The differential data rate is the average rate, after overhead (startup, rewind, etc) is eliminated.

    TABLE 2 -- Differential KB/sec
    Blocksize
    (KB)
    local
    null
    local
    tpnsv
    local
    tpns
    ether
    null
    FDDI
    null
    ether
    tpnsv
    FDDI
    tpnsv
    10379149146351146384390
    2041706626955602317336600
    3243897727585832690428689
    6446337727585492979353772
    8046337727656273089399772
    12846337657586043336428772

    Raw Data

    Click here for the raw results for all the individual test passes.

    REAL LIFE

    TABLE 3 -- Real Life Results
    File
    system
    Size
    (MB)
    Blocksize
    (KB)
    trans
    port
    speed
    (KB/s)
    A1321110ether52
    A1340380ether249
    A2152880fddi682
    A3 81880fddi690
    B1132480fddi690
    B2366480fddi694
    B3375980ether267
    C1397010local453
    C1397110local542
    C1409810local584

    Table 3 shows actual dump speeds on several of our filesystems. Output is to a Cipher C860 6GB DLT tape drive, with a nominal peak data rate of 800 KB/s. Filesystems A1, A2 and A3 are located on 2GB 3600 RPM 5.25" disk drives. B1, B2 and B3 are on 4GB 5400 RPM 3.5" disk drives. C1 is on a RAID-5 system, with CMD controller and 5 2GB 3600 RPM 5.25" disk drives.

    For A1, increasing blocksize from 10KB to 80KB resulted in a data rate increase from 52 KB/s to 249 KB/s. This falls short of the 400 KB/s data rate in the test results. The reasons may include:

    Real-life FDDI dumps were about 10% slower than predicted by the test results. About half of this is explained by assuming that DLT overhead is the same as 8mm overhead (about 100 seconds).

    The several dumps of local filesystem C1 vary considerably. They are all below both the theoretical maximum for this DLT tape drive (800 KB/s) but also below the FDDI results. The probable explanation is that C1 is one of several filesystems on a RAID-5 system; that the RAID system is itself slow; and that both C1 and another RAID filesystem are the most actively used, resulting in poor performance due to disk bottlenecks.


    This work was performed by Art Perlo at the Center for Structural Biology at Yale University in February, 1996. Please direct questions and comments to perlo@csb.yale.edu.

    Document revised on Friday, 15-Mar-1996 16:08:49 EST