Enter phrase or multiple keywords

Sun Managers Archives (home)
Other Resources
Latest Unix, Internet & Security News
Latest News From Sun
Focus Sun Mailing List
Solaris Feature Articles
Internet Security
Network Security
[Prev][Next][Index][Thread]
SUMMARY: Disk array configuration

To: sun-managers@ra.mcs.anl.gov
Subject: SUMMARY: Disk array configuration
From: ps4330@gdatabr.mmac2.jccbi.gov (Peter Schauss x 2014)
Date: Thu, 21 Dec 1995 14:20:31 -0500
Followup-To: junk
Reply-To: ps4330@gdatabr.mmac2.jccbi.gov (Peter Schauss x 2014)
Sender: sun-managers-relay@ra.mcs.anl.gov
First appologies for not summarizing sooner.  I was out of town for
most of this week.

Thanks to the following people who responded:

mshon@sunrock.East.Sun.COM (Michael J. Shon {*Prof Services} Sun Rochester)
springer@aitsun500.tcinc.com (Jerry Springer)
seanm@sybase.com (Sean McInerney)

My original posting was:


|
|I am getting ready to install a SPARC 20 which will be running a 
|large Oracle database, stored on a Sparc Storage Array with at total
|of 27 2.1 GB Disk drives.  My primary concern is to get the best
|performance out of the system.  Can anyone give me some suggestions
|as to how I should configure the array?


|Are there any good 
|references on the subject?


mshon@sunrock.East.Sun.COM (Michael J. Shon {*Prof Services} Sun Rochester)
wrote:

----  begin included message  ----------------

Raw devices *are* faster, but not by much, and are harder to manage,
and harder to fill efficiently.
I will assume that most or all of your database is on UFS filesystems.

Warning: assume that between filesystem overhead, database metadata like
logs and such, you need to configure at least twice as much disk as data.
i.e. for a "10Gb database" you should have at least 20Gb of disk.
Some of my other suggestions which come later will increase this 
even further. If you find that you do not have enough disk space to
implement these suggestions, then do not be surprised if you do not achieve
the performance that you are hoping for. More disks will help.

Also, small disks will help - use the smallest drives that you 
reasonably can, and get more of them.
60 1Gb drives will gives you twice as much performance potential as
30 2Gb drives. Don't even think about those 9Gb drives unless you are
just setting up a big archive that no one will ever search.

Within reason, you want as many disk arms working for you as possible.
Because of this, you want to use striping and/or mirroring.
Do NOT use RAID 5 unless the data is very much read-mostly; it will 
kill you on writes.

Use Solstice Disk Suite (or the Veritas Volume Manager) to set up your
mirroring and striping; for performance, do not use the disks individually.
[unless you want to use Oracle's striping capabilities. Personally, I think
the database is complicated enough, and it's nice to separate the
disk volume management and tuning from the data managment and table tuning.]
Set up stripes and mirrors so that things are in multiples of your
database block size (default 2K), the filesystem block size (default 8k),
and the filesystem cluster size (56K) . 


Put your data on the first part of the disk if you can afford to waste space.
It is noticeably faster than the second part . [ it also shortens avg. seek ]
Use the second part for backups or archives.

Mirroring not only can keep your data safe and your database up,
it can improve read performance because you have two (or three)
sources for the data. 
You can set the policy for which one(s) to read and when.



Keep in mind that (unless your accesses are highly sequential)
most of the time is spend seeking the heads, not transferring the data,
so you should probably work harder to reduce seeking rather than
increasing the transfer rate [especially for small blocksizes] .
 

For fast sequential data transfer, use a stripe interlace smaller 
than the blocksize used to access the data. 
Oracle likes to use 2K blocks, but you do not get much data for all the
time you waste if you have to seek the heads to get it.
If you can afford to waste some space in the tables, use 4k or better yet, use
8K which nicely matches the default block size of the filesystem.
If you put your tables in UFS filesystems
then UFS tries to do sequential reads and writes in 56Kb chunks (you can
change this number by tuning maxcontig in the filesystem) , and in any case,
will access at least 8k at a time (you can tune this too). 
Using a 4k interlace will always result in at least two drives 
transferring in parallel.
This is a Good Thing for the filesystems which hold things which are accessed
sequentially like the logs and rollback files. Your throughput is often
contrained by how fast you can post each transaction.

However, for random access patterns (which databases tend to have) , use
a large stripe width (like the default of one cylinder) so that different
disks are used to get to different parts of the data.
If the requests are random, the disks necessary to satisfy them will
be random, and the overall load will balance reasonably well. 
Your overall throughput is generally quite good this way, 
although any single transfer will only be done by one disk, 
which limits data transfer speed.
Unless you have more CPU power available than you will ever need, you 
probably don't want more than about 6 disks in any one stripe; after that
it starts to cost too much CPU to manage the stripes.

I have heard of one person who set up a single stripe of 30 disks
(an entire SSA) for a single filesystem and put the whole database into it.
He did get extremely good disk performance, with very good load
balancing across all of the disks. However, there are sticky problems with
backup and recovery of a monster like that, and it did burn at least 
one whole cpu just to run the disks.



|Are there any good 
|references on the subject?

It is not specifically about the SSA, but there is a lot of good information
in Adrian Cockroft's Sun Performance Tuning book about disk systems in general.

	Sun Performance and Tuning: SPARC and Solaris  by Adrian Cockcroft, 
	published by SunSoft Press/PTR Prentice Hall, ISBN 0-13-149642-3.

----- end included message  ---

My note: 

I looked at the Adrian Cockroft's book at Borders.  It appears to be a good
general reference, but contains only one or two paragraphs on the subject
of disk arrays.

------ begin included message ----

From: springer@aitsun500.tcinc.com (Jerry Springer)


Anyway, the best thing to do is spread the data across as many drives as
possible. If too much data is on one drive you lose performance waiting for
the heads to get around the platter. Create your volumes (assuming you are using
veritas) by spreading them across at least 3 drives. Don't make them too small
or the slices created on each drive will be too small and actually make things
slower.



From: seanm@sybase.com (Sean McInerney)

 By all means use vxva, but do not use the basi-ops to create the 
 stripe/raid.  Use advanced ops.  The basic-ops path lays out the
 disks in a completely stupid manner.  Partition them with one
 partition with remaining blocks.  Set the interleave/stripe unit
 size to the track size of the disk.  You can get this from the
 prtvtoc command. Or if you are very sure of your average read/write
 size set it to that, but usually the track size is the most efficient. 
 Then create a plex containing all the disks.

 But here is the key.  When you begin to select the
 disks with the middle mouse button choose the first disk on each
 controller (there are 6 controllers in the box, 2 in each tray),
 then choose (highlight) the second disk on each controller and
 continue this pattern until all the disks are selected. Then 
 go to advanced-ops> plex> create.  Then dvanced-ops> volume> create,
 Then Basic-ops> file system> create> striped.  This is for a RAID 0 stripe.
 You must leave one disk available for logging if you use
 RAID 5.

 I used basic-ops>create>raid5 and then did a mkfile 100m on the volume
 (18 1GB disks in this paticular unit, we have three totaling 80Gb)
 and it took around 35-40 seconds.  Then I reconfiged the volume using
 the above procedure and the same lame mkfile test took 15 seconds.

 Bottom line is that you can get lost and confused (and possibly
 make it worse) using advanced-ops but the pain is worth it.

 ....Sean
 
----- end of included messages  ------

Peter Schauss
ps4330@okc01.rb.jccbi.gov
Gull Electronic Systems Division
Parker Hannifin Corporation
Smithtown, NY
Prev: Summary: Internet time services
Next: [SUMMARY] 4mm DAT compression control (4.1.3_U1)...
Index: Mail Index
Thread: Mail Thread Index