Ufsdump to remote tape device

From: Darren Brechman-Toussaint (Darren.Toussaint@SecurityMail.com.au)
Date: Fri Mar 18 2005 - 00:32:21 EST


Hi all,

I have a server that has been running backups of local disks to a tape
drive on a remote system. These backups have been running for a very
long time, but for the last 4 days 1 particular filesystem always fails
when trying to backup to the tape.

The backup script has about 20 different local mount points that get
backed up with both ufsdump and vxdump and all other filesystem backups
continue to work without error. The filesystem that fails to backup is
the local root filesystem. I have also successfully backed up the root
filesystem to a file on the local system without any errors.

Below is the output from ufsdump, after attempting the backup many
times, the backup seams to fail at random points in the backup,
sometimes very soon after starting to dump the files or sometimes 70% of
the backup will complete and then fail with "Lost connection to remote
host."

  DUMP: Writing 32 Kilobyte records
  DUMP: Date of this level 0 dump: Fri Mar 18 02:52:55 2005
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/vx/rdsk/rootvol (asterix:/) to
monitor:/dev/rmt/0un.
  DUMP: Mapping (Pass I) [regular files]
  DUMP: Mapping (Pass II) [directories]
  DUMP: Estimated 8894354 blocks (4342.95MB).
  DUMP: Dumping (Pass III) [directories]
  DUMP: Dumping (Pass IV) [regular files]
  DUMP: 8.66% done, finished in 1:45
  DUMP: 30.76% done, finished in 0:45
  DUMP: 53.92% done, finished in 0:25
  DUMP: 77.03% done, finished in 0:11
  DUMP: Lost connection to remote host.

I have run a truss on the remote system on the /etc/rmt command and when
the backup stops the rmt command is always waiting to read from STDIN,
see the last 10 lines of truss output below:

19227: setcontext(0xEFFFFBD0)
19227: read(0, " W", 1) = 1
19227: read(0, " 3", 1) = 1
19227: read(0, " 2", 1) = 1
19227: read(0, " 7", 1) = 1
19227: read(0, " 6", 1) = 1
19227: read(0, " 8", 1) = 1
19227: read(0, "\n", 1) = 1
19227: read(0, "9010\01C @\0\0\09210 14".., 32768) = 27740
19227: read(0, 0x0002A6EC, 5028) (sleeping...)

When the ufsdump reports "Lost connection to remote host", it continues
running and must be killed with a -9, also the /etc/rmt processes on the
remote host must be killed with a -9.

System details:
System running backup:
- Solaris 8
- root filesystem is a Veritas volume manager mirror. (have successfully
backed up another Veritas
- ufs and vxfs filesystems

Remote system details:
- Solaris 9 with latest patches as of Wed 16th Mar
- writing to local DLT7000

Can anybody give me any clues on what might be happening or how to fix
it?

Thanks

Darren Brechman-Toussaint
Unix Administrator

SecurityMail Pty Ltd
36 Northlink Pl, Virginia QLD 4014
t (07) 3866 8444
f (07) 3866 8400
m 0439 866 844
e Darren.Toussaint@securitymail.com.au
w http://www.securitymail.com.au
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:23 EDT