Update to CPU Performance

From: Howard Arnold (arnoldh@celerent.com)
Date: Wed Jul 20 2005 - 22:44:47 EDT


I wanted to thank everyone for their quick responses. Most indicated tools
like dcpi, kprofile, and uprofile for profiling the application.

I just wanted to follow up on a few things I did and the response I got.

This application was moved over from an 8400 with HSZ70 based storage so it
had multiple spindles defined to help with the I/O performance. It is now
running on an GS1280 connected to a XP128. We laid out the storage with
multiple disk luns thinking that it was mostly I/O, but since the storage is
so much faster we found that it is now CPU bound since pieces of the
application are only running on a single CPU. We are benchmarking only a
single instance of the application so as more instances are brought online
the other CPU's will become utilized.

One thing that I don't understand is if I run the application creating
multiple mount points using a single disk with 5 partitions I get a much
faster response time than if I run the application using 5 separate disks
with a single mount point off each disk. I get almost a 25% better time
using the single disk.

While the application is running I have been running collect -sc and with
the single disk I get a shorter wait time than when I run the application
with the 5 disks. I have also run monitor and I found that with the 5 disks
the Device Interrupts are almost 100 percent higher than with the single
disk. This also goes with the context switches which I am seeing as high as
10000/s with the 5 disks.

Does it make since that since the CPU is being used so close to 100% that it
doesn't have time to answer the Device Interrupts from having multiple disk
slowing the application down even more?

One of the warnings of sys_check indicated that I should up the
round_robin_switch_rate to 40 from 0. Does any one think that this will help
and if so do I need to reboot to set this?

I did find that the advfsd was running and I shut that down which did give
me better performance on the single disk benchmark. It didn't have any
effect on the multiple disk benchmark.

This is a little confusing because I always thought that the more you could
spread out an application over multiple disk the better performance.

Thanks,

Howard Arnold
Consultant Engineer
Email: arnoldh@celerent.com
Phone: (603)685-6060 ext. 206

Original Question:

I have an application that will run on only one CPU at a time and when it
runs it using 100 percent of the CPU. There is very little I/O going on.
What I would like to do is see if there is anyway to improve the performance
of the CPU? I'm running on a GS1280 now so I can't get new hardware and I
know it is a poorly written application I just want to get the most out of
the CPU.

Are there any tools that would tell me what the application is spending most
of it's time doing so that I may be able to make some sysconfig changes to
get a little more performance? To test the performance I run a report and
the first time I run this it takes 20 minutes. The second and ever time
after this it takes only 12 minutes to run. I assume this is because it is
loaded into cache. I umount the filesystem and remount it I go back to the
initial 20 minute run. Would modifying any of the max user or UBC buffer
sizes have any performance gains.

I know this is not much to go on, but I was just wondering if anyone else
had a CPU bound application and found a way to get a little more performance
out of the CPU. I don't think the vender is willing to spend the time making
the application able to run on multiple CPU's so I'm stuck doing what I can
do with what I have.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:19 EDT