[HPADM] SUMMARY:Questions about an huge filesystem (one Terabit = 1000Gb )

From: JY.Torres (john_yves_torres@yahoo.com)
Date: Tue Aug 30 2005 - 05:45:45 EDT


Hi all, thank you for your answers.
__________________________________________________________________________
Subject: [HPADM] Questions about an huge filesystem (one Terabit = 1000Gb )

Hi admins!
I need your advises regarding a huge file system dedicated for all Oracle files.
The history :
One year ago, an RP7410 was installed in the purpose to use a tiny database (20 Gb) located on a XP1024 bay. Suddenly, Hp increased the size of the database VG to this value 200 Gb.
But, a few days ago, an architect asked HP to apply a new sizing (1200GB! ) .
As you know, HP guys were stuck with all limitations given the first time, when they built the VG structures. That means, just 16384 PE for a vg. Indeed, they used the magical tool (vgmodify) to enlarge those criterias, w/o results. The tool did not work.
 
So, they just created in a lv, a big file system 1000 Gb (type VxFs) , for all .dbf file, redo log , archive logs.. w/ no possibility to extend it, and 200 Gb in an another lv. The total size of the datas is expected to reach only 600 GB.
 
Questions :
============
 - Are there any known problems about using huge file systems ?
  - Is the command fsck still working , in case of inode failure (I know there is an I/O cache somewhere, but if some inodes are corrupted, is it possible to recover the situation ??
  - Is the command fsadm still working ?
 
- What are the arguments I could use if I would ask them, to rebuild from zero the server in a proper way ?
___________________________________________________________________________
Answer from Dan Sucker :
Most data bases can usually use smaller file systems although they need all the space, it can usually be split up into chunks. The reasons to use small file systems are mostly for
parallel file system based backups.
 
If you have only one large file system, you will probably have one large directory.
If I take Omniback as an example. It uses a filesystem based backup and runs as many
as 32 at one time to a backup medium (tape). The utility can be customized to use directory trees, but you will need to maintain the backup to reflect new/temporary/removed trees.
 
I have always requested that the file systems be at least the "total expected size" divided into 5 to 10 filesystems, in addition all small files should be in separate file systems. The Oracle_home, the archive directory, the temporary files used by ../log and ../out (in Oracle Applications) etc. mean that the backup, and especially the restore is localized to the data needed. If the archive in the same fs as other data, you will find that the size gets smaller and smaller on a daily basis.
 
In summary: one file system is probably possible, but look at the other requirements also, for some reason no one thinks about backup/restore until the data base has crashed.
___________________________________________________________________________
>From Bill Hassel :
When massive increases in databases occur, you must plan for them.
I have never heard of the vgmodify tool working correctly and
I would certainly not use it on a production system.
To answer your questions:
> - Are there any known problems about using huge file systems ?
Not really. Of course, you now have to figure out how to backup
all that data.
> - Is the command fsck still working , in case of inode failure (I know there
is an I/O cache somewhere, but if some inodes are corrupted, is it possible to
recover the situation ??
Absolutely no difference between 100 megs and several terabytes.
> - Is the command fsadm still working ?
Yes.
>- What are the arguments I could use if I would ask them, to rebuild from zero
the server in a proper way ?
There's no need to rebuild the server, just wipe out all of the Oracle
data areas, remove the volume group(s) and then rebuild the VG with the
new disk LUNs. NOTE: If you specify all the LUNs at the time you create
the VG, then the extent size will be correctly sized.

Now I am assuming that there won't be millions of files, just a few
very large files. Of course, you start with a fully patched system
(June 2005 patch set).
______________________________________________________________________
>From Jeff Cleverley" <jeff_cleverley@agilent.com
If you use 11.11, you can exceed the 1 TB limit of 11.00. I think they
put in something ridiculous like 2 petabytes. I've kept ours to 1 TB
mostly for backup purposes. We don't have file systems that large for
Oracle, just regular nfs servers, but I have ~15 file systems of 1 TB
and have worked with them for several years with no problems.
> - Is the command fsck still working , in case of inode failure (I
> know there is an I/O cache somewhere, but if some inodes are
> corrupted, is it possible to recover the situation ??
fsck still works and you may be able to recover. As with any type of
crash or corruption, your milage may vary.
> - Is the command fsadm still working ?
If you have online jfs, the fsadm will allow expansion/reduction of file
systems and defrag, if that's the parts you're asking about.
> - What are the arguments I could use if I would ask them, to rebuild
> from zero the server in a proper way ?
The group that does the databases splits the file systems so that the
data is in seperate file systems from the logs to try and spread the
loads. Part of this depends on the performance of your disk speed.

Jeff
_______________________________________________________________

>From James J. Perry" <jjperry@water.com

Actually large FS are not a problem if implemented properly. I have 2 TB file systems (max size of a VXFS filesystem). Of course I am using Veritas Volume Manger over LVM. It allows on the fly resizing, disk adds, etc.

 The main problem I see here is that you have all of the data, index, and logs all in the same filesystem. It will kill performance. The best bet would be to rebuild to ensure separation of data, logs, and indexes per Oracle’s best practices. This way each can be a on specific device.

 For your second question… While Veritas VxFS will check on filesystem corruption due to a crash, if you have a failed disk, all of the datafiles on that volume will most likely be corrupt. When you run fsck, Veritas will mark the inode bad and then relocate the file to the lost+found directory. The problem is that it does not name the file the same. There is some magic that can be performed, but in my experience it just does not work. This is not a problem with a large filesystem as much as in the way filesystems handle problems.

Yes the command fsadm still works.

 Again all of the problems I think you will encounter are problems with the VG itself. LVM is not as robust of VxVM and that is why it is now being bundled in some capacity with HP-UX 11i and above. What I have experienced is that the size grows exponentially and you hit not only the 16384 PE issue but also the 255 max LV issue. I know that what I have had to do in the past was to set a high max PV with a PV size of 4, 8, or 16 MB. I will note that on HP-UX 10.20 that the max pe per volume can be as high as 65536, max PV is 255, and a max PE size of 256. What I suspect is that your VG is not setup properly. Do a VG display and check.

Here is an example of a VG with Max for max pe, pv.

 --- Volume groups ---

VG Name /dev/vgdb

VG Write Access read/write

VG Status available

Max LV 255

Cur LV 255

Open LV 255

Max PV 256

Cur PV 44

Act PV 44

Max PE per PV 65536

VGDA 88

PE Size (Mbytes) 16

Total PE 94864

Alloc PE 94444

Free PE 420

Total PVG 0

 I think it is a setup issue.

As for the breakdown on the filesystems it does not matter. I have see most people fill to 1TB and then just create another.

______________________________________________________________________

>From Michael VanDorick - PA" <Michael_VanDorick@GMACM.COM
 would hope that you are not using on big file system for any Oracle DB. The best practice is to have a file system structure that would separate out the DB in different areas. Expansion is done by adding more file systems and not extending a single file system. The single file system is also a backup nightmare.
Our standard is as follows:

/data/oracle/SID/

…admin

…archive (if archive logging in use)

…data1, 2, 3…(usually size of whole LUN in large configurations)

…index1, 2, 3, …(usually size of whole LUN in large configurations)

…redo1 (redo logs, best w/FS options delaylog,nodatainlog,mincache=direct,convosync=direct)

…redo2

…systems

…tools

…users

…temp1

…rlbk1 (different LUN than redo1)

…rlbk2 (different LUN than redo2)

…export

______________________________________________________________________________

May the force be w/ you !

Cordialement, Kind regards, Yours sincerely
+33[0]-613-477-747 Private Fax : 1-425-740-1864
JY.Torres -- Systems Consultant in Unix production environments --
____________________________________________________________________
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original transmission and its attachments without reading
or saving in any manner.. Any other use of the email by you is prohibited. Looking ffwd 2 the Iged expression's style. Tolem.
                
---------------------------------
 Start your day with Yahoo! - make it your home page

--
             ---> Please post QUESTIONS and SUMMARIES only!! <---
        To subscribe/unsubscribe to this list, contact majordomo@dutchworks.nl
       Name: hpux-admin@dutchworks.nl     Owner: owner-hpux-admin@dutchworks.nl
 
 Archives:  ftp.dutchworks.nl:/pub/digests/hpux-admin       (FTP, browse only)
            http://www.dutchworks.nl/htbin/hpsysadmin   (Web, browse & search)


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 11:02:49 EDT