Re: This is a long shot question: Causes of corrupted file syste ms and Oracle

From: Green, Simon (Simon.Green@EU.ALTRIA.COM)
Date: Wed Aug 06 2003 - 05:22:20 EDT


Well, you've got three wait processes and a vi, so there was nothing else
going on! I think this sounds like an ideal opportunity to blame a DBA for
all your problems. :-)

Seriously, it does seem quite likely. I would want to check exactly what
this person was doing and whether they'd done it before. If it was a
non-standard procedure, try to reproduce it on a test system. Of course, it
might have been a rare combination of the vi and some particular Oracle
activity.

Simon Green
Altria ITSC Europe Ltd

AIX-L Archive at http://marc.theaimsgroup.com/?l=aix-l&r=1&w=2
AIX FAQ at http://www.faqs.org/faqs/aix-faq/

N.B. Unsolicited email from vendors will not be appreciated.

> -----Original Message-----
> From: John F Riordan [mailto:jriorda2@CSC.COM]
> Sent: 05 August 2003 20:40
> To: aix-l@Princeton.EDU
> Subject: This is a long shot question: Causes of corrupted
> file systems
> and Oracle
>
>
> Hi all,
>
> I have a 7026-6H1 4 processors 6GB Ram
> Storage from EMC 36, 8GB luns.
> AIX 5.1 ML-01
> Oracle 8.1.7
> Sybase 12.5
>
> Today the machine locked up, for the first time in the three
> years we have
> had it. I booted system as I had a clean system dump. As
> the system came
> up my Oracle file system did not mount. As that file system
> was corrupted.
> I ran fsck and the file system was fine. Once Oracle and
> Sybase started, I
> worked with IBM as they analyzed the dump file. I was told the system
> crashed due to a corrupted file system and that file system
> was our Oracle
> application directory.
> I am trying to find what could have caused the corrupted file system.
>
> My errpt goes back to Sep 2002 and there are no errors for
> "Hardware" "Full
> file systems" "jfs_log file size".
>
> I started looking at the Oracle instance logs "adhoc, bdump"
> etc.. The
> only thing I noticed was that one alert log for an instance
> was smaller
> than the others. When I looked at it the start of the log
> was when the
> system came back up. The only data in that log was from
> today after the
> crash. All the other alert logs go back to the beginning of July. I
> noticed one of the Oracle DBA's was logged in when the system
> crashed. At
> the time of the crash he was purging the alert log file that is now
> smaller. Thought he might have deleted the file instead of
> purge. The
> instance of Oracle was still up and running at the time he was in this
> file.
> I noticed in the "kdb.out" file generated by IBM there is a
> list of each
> CPU with PSLOT PID etc.. On one of the CPU's there under the PROC_NAME
> there is "vi"
>
> (1) > status ^M
> CPU TID TSLOT PID PSLOT PROC_NAME^M
> 0 205 2 204 2 wait^M
> 1 1ABB9 427 7044 112 vi^M
> 2 409 4 408 4 wait^M
> 3 50B 5 50A 5
> wait^M
>
> Again, I know this is a long shot, but was wondering if anyone had a
> thought as to what might have happened.



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:17:07 EDT