SUMMARY: Q: [Oracle 8i e450 UFS] Solaris8->9 upgrade: Performance Boost?- Real World Experience ?

From: Tim Chipman (chipman@ecopiabio.com)
Date: Wed Nov 05 2003 - 14:06:51 EST


Sorry for the absurdly late summary. "Better Late Than Never[?]" I hope.
MANY thanks to those who replied [no-particluar-order]: Sebastien
Daubigne, Alex Avriette, Buddy Lumpkin, Neil Guiogue, JV.

The (relatively) quick "executive-type-summary" follows; then a few
verbatim quotes which are especially relevant. See [way below] for the
original posting.

-> Alas nobody has done "exactly what I am asking about" (Sol8->9
upgrade of e450 Oracle server, evaluate/noted performance changes with
upgrade and also UFS parameter tuning, in particular, "forecedirectio"

-> It appears that Solaris8 Update3 and later have all contained this
feature, "Concurrent DirectIO", which ?apparently? is of greatest
potential benefit to environments experiencing "Parallel Write-Intensive
Database Access". ie, not just Solaris9 has this .. but all solaris8
since MU3. One response indicates that these features are actaully
present in solaris8 since release 01/01, (as suggested at the URL
http://solaris.java.sun.com/articles/performance.html ).

-> one respose indicated that enabling DirectIO on their system after an
upgrade to solaris9, performance "appeared to increase" but this wasn't
a hard benchmark. [see below for exact text/comment]

->one response suggests that systems experiencing "Tons of IO" [service
times of 500-1000ms per request] are those which benefit most from this
sort of tuning, and would especially benefit if migrated towards storing
data on a character device [Raw device, Quick I/O, Direct I/O, etc..]
rather than on a block device. [see response2 full text below, a bit
longer/more details]

->one response suggests, the upgrade is almost certainly a not-bad
thing, potentially will give some small boost "in general", but that
hoping to get magical boosts via directIO mount options is improbable.
If magic is required, maybe an investment in vxfs is in order, which is
suggested to be "ALWAYS faster than UFS".

-> Finally, don't forget to use "noatime" and "logging" mount options [I
was] since these should be of some benefit. Plus of course, examine how
oracle data is stored / distributed across controllers/disks and if
possible try to optimize. Plus of course, app level (query tuning,etc)
optimizations are often the best way to enhance performance, since
legacy-legacy-obselete-workarounds are a surefire way to degrade
performance (ie, "why are we doing things this way anyhow?" syndrome :-)

All this, finally, to say: I haven't yet upgraded the box to solaris9
but I plan to do so sometime in the not-too-distant-future. We
previously tested directio mount option (and given this is solaris8
10/01 release, which is later than 01/01, it suggests I've already got
all the directio magic on-board already) and found it caused a drop in
performance / increase in CPU load / increase in IOwaits. Clearly in
light of this I can't expect Sol9 DirectIO to make things go faster but
I'll probably test it again just to be certain.

Many thanks again to everyone for their replies. I hope this info gets
archived and is of use to someones else, somewhere/somewhen.

--Tim Chipman

========================================================================
===Response One- re: Sol8->9 upgrade, subsequent DirectIO enable==

I can tell you that the Sun Blueprints book, _Tuning Databases for the
SOlaris Platform_ references directio specifically. We run our Oracle 9
database here on directio disks, and I have been very pleased with the
performance. ...[mini-snip]....... However, my personal tests (I have
"wellness" tests for the database which I run nightly) seem to indicate
that the database is faster than it was without directio. However, you
can actually mount filesystems with directio on Solaris 8. I cannot
confirm that the directio option in 9 is faster than 8, we weren't using
it in 8. When I made the switch from 8 to 9, I got our new raid set up,
and made sure Oracle was happy with that. It was. I then switched on
directio, and it was even happier -- and the DBA commented that something
had "changed" and the database seemed faster. He no doubt is plotting
a way to slow it down for me.

========================================================================
==Response Two - re character-vs-block devices & theoretical issues.==
On a system that is heavily I/O bound, you might see some fairly high
gains, maybe 20-30%, but I would only predict an 80% gain if the system
was totally on it's knees because of thrashing in the VM. Here's how
this works:

Any "filesystem" I/O in Solaris uses the page fault mechanism either
directly via mappings obtained via mmap() which causes an as_fault()
(page fault) to occur, or indirectly thru the segmap driver for reads
and writes which also causes an as_fault() to occur.

Even if a file is opened with O_SYNC (which is what Oracle does BTW) on
cooked filesystems, a write call will block until the write succeeds,
but all of the data will still end up in memory as each page is faulted
in via the page fault mechanism.

The problem is since all I/O is forced thru physical memory, there comes
a point where the VM system is literally your bottleneck.

In Solaris 2.6 this was worse because once physical memory reached
lotsfree (or cachefree with priority paging enabled) the scanner was the
only method for evicting old pages to make room for new ones.

In Solaris 8, they implemented a cyclical page cache so that once the
segmap is full of 256mb worth of mappings, it would take the pages
referenced by the oldest mappings and put it on the cache list. This
solved a memory leak that was present in Solaris 2.6, but doesn't solve
the problem that the VM system wasn't designed to handle tons of disk
I/O to the filesystem.

Using a character device (Raw device, Quick I/O, Direct I/O, etc..)
aleviates this because any I/O to a character device vectors directly
into the device driver for that device. The device does not have to
adhere to the vfs/vnode abstraction interface that was put in place for
all block I/O (each filesystem has an ops structure that implements
read,write,ioctl,etc.. for that filesystem). For a character device
reads and writes are funneled directly to the disk they represent and
completely bypass the page fault mechanism.

So that's theoretical and you asked about real world experience, right?

I can tell you first hand that we have experienced significant
performance gains by going to quick I/O and Solaris 8. It's hard to
quantify which part of the gain is the Kernel Asyncronous I/O and which
is bypassing the VM system, but I can tell you, there's a solid 30% gain
to be had on a system that's doing tons of I/O.

What's tons of I/O?

Watch iostat output, in particular the average service time. It's not
the most accurate metric because it speaks of the total amount of time
for the I/O to be serviced, including the amount of time the I/O was
queued up in the device driver, but if your getting service times that
start to approach .5 to 1 second (500ms - 1000ms) you are heavily I/O
bound and will notice an improvement switching to a character device and
optimizing the storage end as well. Keep in mind that if the I/O sizes
are extremely large like during large table scans it will be normal to
have higher than average service times, but 500 - 1000 is still too much.

We use Quick I/O on all of our large Oracle Database systems. There have
been times when files were added and the DBA forgot to convert the file
to Quick I/O. We noticed a difference when only 9 files were not Quick
I/O after converting them to Quick I/O.

========================================================================
===ORIGINAL POSTING==

Hi All,

I've done a bit of digging but cannot find anything other than "Sun
promo material", and ideally, I'm hoping to get some "real world
experience from the trenches" which is relevant.

We've got an Oracle8i database running on a "Solaris8 e450"
(4x400mhz/2gigs ram/solaris8 10/02 fully patched, "important data" {WRT
performance} is on a T3 and less critical data is on an A1000; using
vanilla logging UFS filesystems for the oracle data slices).

Reading the solaris9 release & feature notes, I am particularly tempted
by the "UFS Concurrent Direct I/O" features that claim to provide (in
certain circumstances) "87% performance increase over directIO alone"
[this stat quoted from the URL
http://wwws.sun.com/software/solaris/sparc/solaris9_features_scalability.html
  ]

However, being familiar with the reality that "certain circumstances"
often refers to .. conditions that will not ever be relevant to me ... I
thought I would try to get any feedback from real-world deployments
which have migrated Oracle8 DataBases from Solaris8 to Solaris9 while
staying on the same "not terribly current Sparc Hardware", and if there
were indeed any performance increases (decreases?) observed.

Ultimately, I realize the only way to be *certain* is to jump in and try
it myself -- but prior to this I thought it would be prudent to hear
(ideally) at least a couple of success stories to help sway my reluctant
stodgey "things work just fine as they are right now, thank you very
much" side of myself

Many thanks, and as always, a summary will follow .
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:25 EDT