E6500 pauses on load

From: Rick von Richter (rickv@mwh.com)
Date: Wed Jun 16 2004 - 19:21:45 EDT


INFO:
E6500 - 28 CPUs - 28 GB
solaris 2.6
oracle 7.3.4.5

What we are seeing is an intermittent pause of all network traffic from
this box. The pause goes anywhere from 5-30 seconds. Customers are
connecting to it via an Oracle GUI client on Windows. What they see is
an ORA-3113 error which is basically "got disconnected from server".
This is a load based problem which seems to manifest itself when the net
load gets to around 4Mbit. It was first connected thru a Gig-E card
(ge) and then we tried an HME card and it happens on both. It is
connected to a Cisco 6509 switch. The loads on the box is around 17,
the kernel load is around 15 and iowait is around 5. So the box is
apparently running great from those numbers. We have done multiple
snoops on this server and we noticed this "DEAD" gap frequently where no
packets are transmitted from this server. Our network guys have
confirmed this on their own packet traces. This is a very busy server
with around 4000 connections. It is running oracle and does OLTP to the
tune of about 3500 TPM. We have Sun, Cisco, and Oracle looking at this
but no luck yet. One thing of notice is that I am getting a high amount
of nocanputs on the NIC (seen from a 'netstat -k ge0' or 'netstat -k
hme0'. We have sq_max_size set in the system file to 100. I heard a
few opinions on this but I haven't had a good opinion on what this
should be for our situation.
I don't know what data you gurus might need to help look at this. I am
running Sun's guds script every hour, so I have that data. I have
included the /etc/system file as a starting point to help you. If you
need any more info please ask and I'll give.
This is our MAIN production box so it's imperative that we find a solution.

Thanks for your help gurus.

/etc/system
---------------
* increase number of pseudo ttys
set pt_cnt=512
* end ttys

** ----------------------------------
** Semaphores/Shared Memory for ORACLE
** ----------------------------------
** shmmax = max size per shared memory segment
*set shmsys:shminfo_shmmax = 4094967296
** shmmin = min size per shared memory segment
*set shmsys:shminfo_shmmin = 32
** shmmni = max num shared memory identifiers
** raised by JM for DBW upgrade (original value=40)
*set semsys:seminfo_semmni = 200
** shmseg = max num shared memory segments per process
*set shmsys:shminfo_shmseg = 128
** semmsl = max num semaphores per semaphore set (=semopm)
**set semsys:seminfo_semmsl = 2048
*set semsys:seminfo_semmsl = 8192
** semopm = max operations per semop call (=semmsl)
*set semsys:seminfo_semopm = 10
** semmns = max num semaphores system wide
** semnms = (sum of the PROCESSES parameter for each Oracle database,
** adding the largest one twice, and then adding an additional
10 for each database)
*set semsys:seminfo_semmns = 8192
** semmni = max num semaphore sets system wide (=semmnu)
**set shmsys:shminfo_shmmni = 100
*set shmsys:shminfo_shmmni = 1000
** semmnu = max num sempahore undo structures (=semmni)
**set semsys:seminfo_semmnu = 100
*set semsys:seminfo_semmnu = 30
** semmap = max entries per semaphore map (=semmni*semmsl)
**set semsys:seminfo_semmap=4000
*set semsys:seminfo_semmap=200
** semume = max undo entries per process
**set semsys:seminfo_semume = 100
*set semsys:seminfo_semume = 10
** semvmx = max value of a semaphore (cannot be > 32767)
*set semsys:seminfo_semvmx = 32767
** end of oracle stuff
*
* zeus oracle stuff
set semsys:seminfo_semmap = 200
set semsys:seminfo_semmni = 200
set semsys:seminfo_semmns = 8192
set semsys:seminfo_semmnu = 30
set semsys:seminfo_semmsl = 8192
* to match number of Oracle processes
set semsys:seminfo_semopm = 10
set semsys:seminfo_semume = 10
set semsys:seminfo_semusz = 96
set semsys:seminfo_semvmx = 32767
set semsys:seminfo_semaem = 16384
set shmsys:shminfo_shmmax = 4094967296
set shmsys:shminfo_shmmin = 32
set shmsys:shminfo_shmmni = 1000
set shmsys:shminfo_shmseg = 128
* end of oracle stuff
set semsys:seminfo_semusz = 96
set semsys:seminfo_semaem = 16384

* Force hme NIC cards to 100Mbit/full duplex
* RickV - 040615
set hme:hme_adv_autoneg_cap=0
set hme:hme_adv_100hdx_cap=0
set hme:hme_adv_100fdx_cap=1
*

* other stuff
*set maxpgio = 80

* file system stuff added 04/04 -BR
set priority_paging=1
set fastscan=131072
set maxpgio=65536
* end BR

set rlim_fd_max=2048
set rlim_fd_cur=2048

* Network stuff
*set sq_max_size=30
set sq_max_size=100
set nfs:nfs3_max_threads=24
*set nfs:nfs3_nra=10
set nfs:nfs3_nra=0
*set nfs:nfs_max_threads=24
*set nfs:nfs_nra=10
set nfs:nfs_nra=0
set maxusers=1024

* increase scan rate for e-cache parity on CPU's - 030523 MM
set ecache_scrub_enable=1
set ecache_scan_rate=1000
set ecache_calls_a_sec=100
* end e-cache
* Begin MDD root info (do not edit)
forceload: misc/md_trans
forceload: misc/md_raid
forceload: misc/md_hotspares
forceload: misc/md_stripe
forceload: misc/md_mirror
forceload: drv/pci
forceload: drv/isp
forceload: drv/sd
rootdev:/pseudo/md@0:0,0,blk
* End MDD root info (do not edit)
* Begin MDD database info (do not edit)
set md:mddb_bootlist1="sd:7:16 sd:7:1050 sd:127:16 sd:127:1050"
* End MDD database info (do not edit)
* veritas system parameters per SUN/VERITAS
set ufs:ufs_HW=1048576
set ufs:ufs_LW=917504
set maxphys=1048576
set bufhwm=8000
set cachefree=0x40000
* End Veritas/SUN recommended parameters
* vxvm_START (do not remove)
forceload: drv/vxdmp
forceload: drv/vxio
forceload: drv/vxspec
* vxvm_END (do not remove)

[demime 1.01b removed an attachment of type text/x-vcard which had a name of rickv.vcf]
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:28:53 EDT