Warning: Do not use DISM on Solaris 8 or Solaris 9 pre-update 2

From: Lumpkin, Buddy (Buddy.Lumpkin@nordstrom.com)
Date: Thu Jan 29 2004 - 03:50:58 EST


FYI

We recently had a large Data Warehouse System running Oracle 9i (9.2.0.4) on
it's knees for no apparent reason (In terms of the workload on the system).
vmstat and mpstat showed roughly 70% to 90% of the cpu time was charged as
system time, and the smtx field in mpstat was up in the tens of thousands.
Lockstat -H showed a kernel function as a point of contention so I plugged
that function name into sunsolve and found an InfoDoc describing our poor
performance that said the problem was caused by using the DISM feature.

The infodoc goes on to recommend that DISM (Dynamic Intimate Share Memory) not
be used at all with Solaris 8, or Solaris 9 prior to update 2.

I cannot find this information mentioned out there anywhere else, so I wanted
to share it on Sun Managers. I surely hope that people are being warned about
this bug since it can be such a silent performance killer. Not only are the
CPU cycles eaten up spinning on a CPU (man mpstat and read about the smtx
field), but it looks to me like extra locking in the kernel is happening as
well (if the mutex spins do not get the lock, then they have to wake up and
try again, therefore they spend more time sleeping as well as spinning IMHO)
which essentially means time spent "not performing work" to me.

I sent the message below to a couple of our local Sun Engineers, Sun
Blueprints and one of the authors of a Blueprint published this month because
the Blueprint essentially encourages using DISM on Solaris 8 or 9 with no
mention of this bug. I have received no response.

Has anyone else ran into this bug? What extent has it effected you?

A quick method for checking if Oracle 9i is using DISM is: ps -ef | grep dism.
If you see ora_dism_<instance name> then Oracle 9i is making use of DISM
rather than ISM. Not only is there a bug, but I also found that DISM is said
to be at least 10% slower than ISM in Solaris 8 since it does not use a 4MB
page size like ISM does.

I looked on our other systems running Solaris 9 and found a couple others that
showed signs (to a much lesser extent) of this bug, and DISM was enabled there
as well.

Regards,

--Buddy

-----Original Message-----
From: Lumpkin, Buddy
Sent: Thursday, January 22, 2004 9:38 PM
To: Sun_BluePrints@sun.com
Cc: kristien.hens@belgium.sun.com[Lumpkin, Buddy] Extra recipients left off
Subject: Suggestions / Questions

FYI

We have a large Data Warehouse System running Oracle 9i here that was running
extremely slow. Upon inspection, the system was showing extremely high amounts
of CPU time charged as "system" or time spent in the kernel in vmstat output.
I found this disturbing so I followed vmstat with mpstat and found an
extremely high number of spins on a mutex lock (smtx field in mpstat). To try
to isolate why adaptive mutexes were spinning on the CPU, consuming valuable
CPU time and then failing to acquire the mutex, I decided to run lockstat. I
found this:

-----------------------------------------------------------------------------
----------------------------------

lockstat -H sleep 5 | morelockstat: warning: ran out of data records (use -n
for more)

Adaptive mutex hold: 3400706 events in 5.006 seconds (679371 events/sec)

Count indv cuml rcnt nsec Lock Caller

-----------------------------------------------------------------------------

--
529006  16%  16% 1.00      643 0x30023fbc890          segspt_softunlock+0xf0
15021   0%  16% 1.00     2153 ph_mutex+0x1b0         page_find+0x94
14953   0%  16% 1.00      281 anonhash_lock+0x148    anon_getpage+0xb4
14932   0%  17% 1.00     1892 ph_mutex+0x50          page_find+0x94
The "unlock" portion of softunlock sounded like a mutex related kernel
function name to me, and I knew that segspt was a shared memory segment
driver. Since this is a large Oracle DB Server I figured this was probably the
culprit. I compared the lockstat -H command output with other large Oracle
Database systems acting normally thinking maybe an oracle system always spends
a lot of time waiting on this call due to SGA related locking, after all it is
a mutex related call, but I didn't find it as a source of contention on any
other systems.
I searched for "segspt_softunlock" in sunsolve and came up with this info
doc:
http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=finfodoc%2F72952
<http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=finfodoc%2F72952&zone_32=seg
spt_softunlock> &zone_32=segspt_softunlock
The document explains that there is a known bug that causes high system CPU
and extremely poor performance in Oracle 9i, and that as a result DISM should
NOT be used at all in Solaris 8 until the bug is solved. I checked the bug and
it is not solved and has been known about since June 2002. Related bug id:
4672677 explains that the bug is reproducable.
I checked on our system and indeed, our DBA had dynamically increased the size
of the SGA just yesterday, and DISM was in fact active on the system.
After identifying the bug that is affecting us, I assured the Oracle DBA that
was troubleshooting the horrible performance with me, that since this bug has
been around since mid-2002, there must be several documents out there
including Oracle 9i documents that identify this issue and correctly warn DBAs
and System Administrators not to use the DISM feature set on Solaris 8, and
set out to prove it. I was quite disappointed to find that a Sun Blueprint
released January, 2004 titled: "DynamicReconfiguration and Oracle 9i
Dynamically Resizable SGA" is encouraging users with Solaris 8 or higher to
use the DISM features with Oracle 9i and that it makes no mention of this bug.
Here's the blueprint:
http://www.sun.com/solutions/blueprints/0104/817-5209.pdf
It's actually a great blueprint, it just doesn't call out what appears to be a
pretty nasty bug that could prove very difficult to track down and cause a lot
of heartache.
Please make sure the authors (im attempting to CC one of them, hopefully the
email address is correct) know about this bug and the very clear statement in
the infodoc above that the DISM feature should not be used with Solaris 8 at
all, or please correct the Infodoc if this is actually a rare bug. One of the
other would suffice.
In the mean time we will follow the recommendation in the Infodoc mentioned
above and make sure that DISM is not in use on any of our Oracle Database
systems running a Solaris release that is pre Solaris 9, update 2.
Regards,
--Buddy
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:55 EDT