SF480R freezes when importing a large table into Oracle DB

From: sysop@cybergrants.com
Date: Wed Mar 17 2004 - 17:04:40 EST


Hello everyone!

We had a horrible thing happen, and as of now we are clueless about its
origin.

We have two SunFire V480R connected to two D2 arrays using Dual Ultra3 SCSI
LVD HBAs (X6758A). Each D2 is configured as a single bus and
multi-initiated
between the two V480's

Each server has:
2x 900Mhz CPU's
4GB RAM
OBP 4.7.5
Solaris 8 2/2, Kernel 108528-27 + platform-specific patches
SDS 4.2.1 + SDS patch 108693-19

Oracle parameters for shared memory:
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmseg=10
set shmsys:shminfo_shmmni=100
set semsys:seminfo_semmns=1000
set semsys:seminfo_semmni=100
set semsys:seminfo_semmsl=450

Each server has a diskset allocated to it with a total of 12 disks per
diskset
(6 disks in each D2), using RAID 0+1. File systems are configured as soft
partitions, UFS.

The first server runs Oracle 9.0.1.4 in primary more, the second runs a
standby database.
Servers are connected with a point-to-point 1000mbps link using ce1 port.
This link is used for
oracle archiving.

The boxes went to production about a month ago and everything seems to be
working
flawlessly. The nightmare happened when we tried to import a large table
into Oracle
(approx. 5.7 mln rows). At some point (about an hour or so) during import
the server freezed
and stopped responding. When I connected via RSC, the only thing I was able
to do was to
power it off and power on again: it didn't respond to a console 'break' or
even XIR. Of course,
a hard boot had its consequences - Oracle database got corrupted beyond
recovery, and
we had to restore from a backup... Ever since, we have attempted to perform
the import twice
again, with different import parameters, but the same disaster happened
each time.

There are no messages of any kind in the OS or Oracle logs, and no failed
components
reported by prtdiag. We were closely monitoring the system when we ran the
import, using
vmstat, prstat, mpstat, and iostat and saw absolutely no indication of
memory shortage, CPU
overload, or I/O problems. The system runs flawlessly and at some point
simply dies. The
dump file is good because we were able to import this table into another
9.0.1.4 database on
a different server (E220R).

Oracle provided us with no insights so far. What can go wrong? Can this be
a result of a CPU
spin bug or resource consumption that simply went unnoticed? I understand
that Oracle is a
multithreaded app and uses shared memory, but can it crash the system in
such an abrupt way?
If anyone can shed some light on it, that'll be great... Will summarize.

Thanks!
Max.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:28:18 EDT