Very odd ufs symlink hanging (kinda long)

From: Lance Tost (lance.tost@respironics.com)
Date: Wed Jan 14 2004 - 14:17:27 EST


For the past several days, we've been experiencing very odd behavior on
one of our filesystems. It is a ufs filesystem on a Veritas volume
(mirrored). Each plex is made up of raid-5 logical volumes that reside on
T3Bs. The box is a SunFire 4800. We are using logging on all
filesystems.

On Friday, someone created a subdirectory in our /oracle/PRD filesystem.
The subdirectory was 920_64. He complained that it took a long time to
create the directory but since it eventually finished, we didn't think
much of it (assumed a network hiccup or something).

All weekend long, I got complaints from the SAP and DB admins about
commands hanging on this box... commands like ls, pwd, ln, mkdir, etc. So
I would log in and try these commands out somewhere in /oracle/PRD (not
necessarily the 920_64 subdir though). Everything looked fine to me.

Finally Monday, the SAP guy gave me a comamnd to try... He needed a
symlink done: cd /oracle/PRD/920_64; ln -s /oracle/PRD/sapreorg sapreorg.
So this symlink takes about 10 seconds to run. Didn't seem healthy.
After doing more symlinks, I noticed it took this long (or longer) pretty
consistently.

I checked /var/adm/messages, the console, the domain log, the platform
log, and the messages.t3 file on our storage service processor. There was
nothing suspicious in any logs. vxprint showed all volumes as healthy.
We have NFS mounted directories, but this was not one of them (plus, no
nfs timeout messages logged anywhere).

This seemed to get worse... at one point yesterday, a symlink took over 5
minutes to complete. During this time, su's to another user and telnets
to the box hung right after "Last logged in...". Also, creating symlinks
anywhere under /oracle/PRD would hang *if* one was currently hanging in
the 920_64 subdir. Otherwise, I noticed, symlinks in /oracle/PRD but
outside of /oracle/PRD/920_64 worked fine -- I never saw a slow one.

Before I noticed it was tied to a single directory, Sun suggested that
rebooting would solve it (I must have been out sick the day we upgraded to
Microsoft Solaris). Got up at 4AM today, rebooted. It did not solve it.
I saw the slow behavior even with SAP and Oracle shutdown on the box so
load was not a factor.

Today, I created a new /920_64 filesystem on some disks in a D1000. I
moved /oracle/PRD/920_64/* to /920_64... symlinks still hung in
/oracle/PRD/920_64. So I removed that directory and recreated it. Since
then, I haven't had a symlink take longer than half a second anywhere on
the box (including that directory).

I've searched sunsolve and google but haven't come across this issue. Has
anyone ever seen anything like this? Is it possible that somehow the
mkdir corrupted something in the filesystem structure? If so, am I still
at risk for more corruption? During bootup this morning, it didn't do a
complete fsck of the fs. All I saw was "/dev/vx/rdsk/erp-dg/oraclePRDVol:
is logging." Would it be wise to get downtime to force a full fsck of
this filesystem? We've been planning to redo our filesystems anyway
(split them up differently) so we're thinking we should just not touch it
and hope for the best until we can cut over to the new layout.

BTW, regarding the filesystem layout... we're changing it because (thanks
to a consultant), this /oracle/PRD filesystem is 600G (!!!). We would
much rather prefer to break it up into 32GB or so filesystems. Could this
super-sized size be contributing to the strange behavior?

Thanks in advance for any insight to this...

-- 
Lance Tost, Systems Engineer
lance.tost@respironics.com
========================================================================
CONFIDENTIALITY NOTICE
----------------------
This message, together with any attachments, may be legally privileged
and is confidential information intended only for the use of the
individual or entity to which it is addressed.  It is exempt from
disclosure under applicable law including court orders.  If you are not
the intended recipient, you are hereby notified that any use,
dissemination, distribution or copy of this message, or any attachment,
is strictly prohibited.  If you have received this message in error,
please notify the original sender and delete this message, along with
any attachments, from your computer.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:49 EDT