From: Rob Windsor (windsor@warthog.com)
Date: Thu Feb 22 2007 - 12:47:24 EST
I received many responses, some pointed at tools (which is what I was
looking for, honestly), but most had a common theme to them. :)
Original Post:
> We need to sync 10TB of data in small files from one North American
> coast to the other.
>
> Our tenative plans are to sneakernet the data and then use some form of
> sync to catch up the delta.
>
> Aside from bandwidth constraints, we found that rsync quickly craps out
> with large numbers of files.
>
> What tools have you used to do this?
Most popular question:
> Does all 10TB of it change daily?
No. The data comes in two flavors:
* Oracle DBF files (yes, changes daily), less than a TB here
* Small static files, the files themselves don't change, their count
simply increases
- side note: These files are about 16 subdirs deep and heavily
scattered (er.. I mean.. "distributed")
Other common questions/comments:
> You didn't specify how rsync craps out, but i'm guessing
I forget the specifics, but it was basically "out of memory" due to the
number of files and subdirs it has to dig in.
> what version of rsync you're using
2.6.8 (looking at 2.6.9 now to see if it addresses any of the problems
we've had)
> but you can often throw ram at the issue.
Not in this case, unfortunately.
> In addition, you can fire off rsync on a subtree so it has less work to
> do.
That's certainly a consideration. It won't be easy (c.f. "about 16
subdirs deep" above).
Then there were these:
> (Deborah Santomauro) Have you tried "rdist"?
and
> (Anthony D'Atri) rdist 6 from www.magnicomp.com with SSH as the transport works great for managing files.
Holy cow, now that's oldschool love! I'll look into that.
Brad Morrison mentioned:
> I think cpio has a flag to skip files with equal or newer mod dates,
Yeah, we've also considered something like:
"rsync -av `find . -newer <somefile> -print` dest:/path"
just to limit the volume of files that rsync has to consider.
There was mention of NetApp, zfs, VxFS/VxVM, which aren't options in
this situation. As much as I tried to get to zfs, it wasn't available
at the time we upgraded the DB/file servers to Sol10.
Hutin Bertrand mentioned an app called "aide", which is an Intrusion
Detection tool (think tripwire) that you can use to spot files/subdirs
that have changed. interesting find.
(http://sourceforge.net/projects/aide)
Gedaliah Wolosh pointed me at http://www.openafs.org
Karl Rossing mentioned http://opensolaris.org/os/project/avs/
AFS is quite an endeavor, we're not quite prepared to go that route.
AVS looks interesting, we might be able to do something with that, if
heavy rsync-frobulation doesn't work out.
Thanks all!
Rob++
-- Internet: windsor@warthog.com __o Life: Rob@Carrollton.Texas.USA.Earth _`\<,_ (_)/ (_) "They couldn't hit an elephant at this distance." -- Major General John Sedgwick _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers
This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:41:41 EDT