[ SUMMARY ] Oracle DBF file corruption / ADVFS

From: David.Knight@clubcorp.com
Date: Fri Oct 24 2003 - 15:56:49 EDT


Thanks to Henk kalle, Dr. Alan Rollow, Martin Vasas and Bryan Williams for your help and suggestions. Below is a list of responses/suggestions in
no particular order...

_________________________________________________________________

If there was a hardware problem, I would expect to see disk errors in
the binary error log or domain panics in the messages file. There are
several Oracle patches for data file corruption. Check with oracle for
your version.
_________________________________________________________________

 There is an
oracle Unix utility called dbverify which can be used to check the
datafiles.

__________________________________________________________________
there is a init<sid>.ora parameter to fix this problem.
 
edit the init<isd>.ora file : _tru64_directio_disabled=true
restart the the database
recover database.
 

__________________________________________________________________

Oversimplying a bit (or a lot), the two general causes of
                 this type of data corruption are:

                 o Software bug, usually dealing with process interlocks
                    that should prevent two processes from trying to write
                    the same data at the same time.

                 o Undetected hardware errors.

                 About all you can do on the software is ensure that all
                 the most recent patches are installed, in the hope that
                 the relevant have found the software problem and fixed
                 it. Some software systems, such as a database, may have
                 features that can be enabled to double check their own
                 writes and data to help detect and correct errors. I
                 don't have a clue if Oracle has such a feature.

                 The hardware side is even harder, since the cause of the
                 problem was an "undetected" error. Each point in the
                 typical data path has checks that ensure the data going
                 across it is the expected data. Some data paths are
                 parity protected, because they're sufficiently reliable
                 that a double bit error (undetectable) is to rare to
                 worry about. Others are ECC protected, so they can at
                 least detect multi-bit errors and correct single bit
                 ones. Some subsystems support extra levels of checking
                 such as reading the data after being written and then
                 checking that against the original.

                 The rate of undetected data errors is designed to be
                 very low for each part of the system. But, it is a
                 game of chance; even with a low probability, one of
                 them is going happen somewhere. Move enough data and
                 you're bound to see one eventually. In the total
                 universe of data being moved, your problem may have
                 been undetected data corruption, that got noticed by
                 the part of Oracle that checks the format and content
                 of this file.

                 If you have appropriate support services, you probably
                 want to bring the data corruption to the attention of
                 all the vendors involved. How the data was wrong can
                 offer a clue what sort of corruption it was. Data that
                 look appropriate for another part of the same file, or
                 part of a different file, is quite different from a
                 couple of bits being swapped in a single byte of data
                 somewhere.

__________________________________________________________________

----- Forwarded by David Knight/CLUBCORP/US on 10/24/2003 02:51 PM -----

David Knight
10/24/2003 11:04 AM

 
        To: tru64-unix-managers@ornl.gov
        cc:
        Subject: Oracle DBF file corruption / ADVFS

Hello Managers,
        We recently experienced corruption in an oracle dbf file and I am
trying to insure that it is not due to any Unix/hardware issue. I have no
errors in my messages/sys.log files. O/S Version: Tru64 5.1 / Tru Cluster
5. (ADVFS) Any recommendation for insuring that this is not a HW/OS issue
or ways to check the file at the O/S Level /etc would be much appreciated?

Thanks,
David



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:40 EDT