perf probs (excessive content switching noted)

From: Mike Broderick (broderic@MIT.EDU)
Date: Mon Apr 07 2003 - 20:30:00 EDT


I'm trying to analyze/fix performance problems on an 5.1a+pk1 Oracle
8.1.7 data warehouse system (es40 2x633Mhz +2GBmem) and what I see is
that seems to be excessively context switching (but little/no
paging/swapping and diskI/O) even when fairly idle:

# vmstat -w 3
Virtual Memory Statistics: (pagesize = 8192)
  procs memory pages intr cpu
  r w u act free wire fault cow pin pout in sy cs us sy id iowait
  4 156 36 92K 146K 16K 466M 70M 83M 0 530 4K 4K 13 4 78 5
  4 157 36 92K 146K 16K 43 27 43 0 947 1K 8K 0 0 100 0
  5 156 36 92K 146K 16K 0 0 0 0 956 1K 8K 0 0 100 0
  5 156 36 92K 146K 16K 0 0 0 0 948 1K 8K 0 0 100 0
  4 157 36 92K 146K 16K 0 0 0 0 960 1K 9K 0 0 100 0
  5 156 36 92K 146K 16K 0 0 0 0 941 1K 8K 0 0 100 0
  4 157 36 92K 146K 16K 0 0 0 0 961 1K 9K 0 0 100 0
  4 160 36 92K 146K 16K 255 83 77 0 947 2K 8K 0 1 99 0
  5 159 36 92K 146K 16K 0 0 0 0 962 1K 9K 0 0 100 0

# iostat dsk7 dsk10 dsk11 dsk12 dsk13 dsk14 3
    tty dsk7 dsk10 dsk11 dsk12
dsk13 dsk14 cpu
 tin tout bps tps bps tps bps tps bps tps
bps tps bps tps us ni sy id
   0 16 3167 142 303 5 160 7 160 7
279 3 3129 124 5 9 4 83
   0 62 0 0 0 0 5 0 5 0
171 0 5 0 0 0 0100
   0 62 0 0 0 0 5 0 5 0
0 0 5 0 0 0 0100
   0 62 0 0 0 0 5 0 5 0
171 0 11 1 0 0 0100
   0 62 0 0 0 0 5 0 5 0
0 0 5 0 0 0 0100
   0 62 0 0 0 0 5 0 5 0
0 0 5 0 0 0 0100
   0 62 0 0 0 0 5 0 5 0
0 0 5 0 0 0 0100
   0 62 0 0 0 0 5 0 5 0
0 0 5 0 0 0 1 99
   0 62 0 0 0 0 5 0 5 0
0 0 5 0 0 0 0100

We have a smaller "sister" development data warehouse system (4100
2x533Mhz +2GBmem) that has a little less load but context switches way less:

# vmstat -w 3
Virtual Memory Statistics: (pagesize = 8192)
  procs memory pages intr cpu
  r w u act free wire fault cow pin pout in sy cs us sy id iowait
  5 110 38 97K 138K 19K 0 0 0 0 3 87K 88 38 12 50 0
  5 110 38 97K 138K 19K 0 0 0 0 4 87K 64 38 12 50 0
  5 110 38 97K 138K 19K 0 0 0 0 3 87K 86 39 12 50 0
  5 110 38 97K 138K 19K 1 0 1 0 3 87K 88 38 13 50 0
  5 110 38 97K 138K 19K 0 0 0 0 3 88K 69 38 12 50 0
  5 110 38 97K 138K 19K 0 0 0 0 4 88K 89 38 12 50 0
  5 107 38 97K 138K 19K 3 0 0 0 2 88K 85 37 13 50 0
  5 107 38 97K 138K 19K 0 0 0 0 3 88K 82 38 12 50 0
  5 107 38 97K 138K 19K 0 0 0 0 1 87K 65 38 13 50 0
  5 107 38 97K 138K 19K 0 0 0 0 1 88K 88 38 12 50 0
  5 107 38 97K 138K 19K 0 0 0 0 8 88K 96 38 12 50 0

Anyone have ideas why the former would be context switching so much?
 Any tips on where else to look for clues?

Other observations:
    The former/slower system actually has [faster] HSG80/SAN disks and
the latter [slower] local/HSZ70 storage
    There aren't maky runable procs so it seems there are enough CPUs.
    The disks are mostly RAID-5 and could be optimized (0+1) but I don't
see excessive I/O waits?

                                                                 _Mike



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:15 EDT