Solaris 10, logging UFS, lots o' deletes, and statvfs64 hanging ...

From: Donahue, Adam (Adam.Donahue@kbcfp.com)
Date: Wed Dec 20 2006 - 10:08:29 EST


Folks,

We're experiencing an issue in which statvfs64 calls appear to hang upon
access to logging UFS filesystems under certain conditions.

Background:

We mount four large UFS filesystems on a Solaris 10 (x86) host, each
mounted with the logging option:

                donahuea@ubbsx1.nyc:/u/donahuea> uname -a
                SunOS deputysx.nyc.kbcfp.com 5.10 Generic_118855-19
i86pc i386 i86pc
                donahuea@ubbsx1.nyc:/u/donahuea> df -k | grep
/fs/data/ubb
                /dev/dsk/c5t204500A0B8264E0Ed0s0
                                     1057387721 877587437 169226407
84% /fs/data/ubb-hd1
                /dev/dsk/c5t204500A0B8264E0Ed3s0
                                     1057387721 888318801 158495043
85% /fs/data/ubb-hd4
                /dev/dsk/c5t204500A0B8264E0Ed1s0
                                     1057387721 835818923 210994921
80% /fs/data/ubb-hd3
                /dev/dsk/c5t204500A0B8264E0Ed2s0
                                     1057387721 804117910 242695934
77% /fs/data/ubb-hd2

The filesystems house some of our site-wide backups. They're shared,
mounted via NFS by several other servers (automounts), and written to by
the backup clients on those servers.

We use a date/host.dumpobject.level naming structure to distinguish
backups from one another, and backups eventually end up in a compressed
or archived format ready for flushing to tape. (I won't get into the
gritty details, but this happens by either the client at the time of
backup, or by a nightly process that scans and compresses completed but
uncompressed backups.) Note that due to the nature of our environment -
including large filesystems and several Oracle and Sybase databases -
these compressed/archived files can be quite large - 10s to 100s of
gigabytes each.

Once a backup has been flushed, and assuming it meets a series of other
criteria (such as not being the most recent full backup), it's a
candidate for deletion. The deletions are handled en masse by another
process, run via cron, that sweeps through the filesystems and deletes
all the candidate files.

And this is where we get the problem: since our recent upgrades to
Solaris 10, the drivers we use to access the storage, patching of tools,
etc. -- all of which are on the most recent version - we experience a
condition wherein immediately following this large series of deletes,
the system seems to hang. Sometimes we can't get on at all - rsh,
console login:, etc., -- it just hangs. Sometimes we can but when we
run, say, a df on those filesystems, or even an ls -l on its files, the
command just hangs. A truss reveals it hangs during the statvfs64 call.

We've seen some notes on the OpenSolaris forums indicating this may have
to do with a bug in the way these deletes are reflected in the
superblocks of UFS when it's in logging mode - basically, statvfs64 can
take a long time when large files have been deleted (and are being
performed via the logging mechanism). We think this may have something
to do with our issue, as these hangs happen right after our deletion
process has nuked close to some 1TB of data or more. But we can't seem
to find a fix, nor more specifics on this issue.

Has anyone else experienced this, or seen a related issue, and does
anyone know a workaround and/or fix?

We're likely going to disable logging on these filesystems to prevent
the issue, but are checking a couple other things first.

Thanks for reading.

Adam

--
This message may contain confidential, proprietary, or legally privileged
information. No confidentiality or privilege is waived by any transmission to
an unintended recipient. If you are not an intended recipient, please notify
the sender and delete this message immediately. Any views expressed in this
message are those of the sender, not those of any entity within the KBC
Financial Products group of companies (together referred to as "KBC FP").
This message does not create any obligation, contractual or otherwise, on the
part of KBC FP. It is not an offer (or solicitation of an offer) of, or a
recommendation to buy or sell, any financial product. Any prices or other
values included in this message are indicative only, and do not necessarily
represent current market prices, prices at which KBC FP would enter into a
transaction, or prices at which similar transactions may be carried on KBC
FP's own books. The information contained in this message is provided "as is",
without representations or warranties, express or implied, of any kind. Past
performance is not indicative of future returns.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:41:23 EDT