SUMMARY : SMP scheduling issues

From: Zeev Fisher (zeevf@galileo.co.il)
Date: Wed Jul 03 2002 - 04:44:44 EDT


Special thanks to :

Darren Dunham
Michael Miller
Thomas Wardman
Dave Mitchel
Kevin Buterbaugn
Jay Lessert

For their quick response

Well , the consensus was that i should not worry and everything is
running just as it supposed to be.
Below are several responses :

..
Pretty standard. Eventually the LWP uses up the time quantum and is
forced off the CPU. The processes/LWPs do jump around, but they'd have
to do it very quickly to make any difference.

If you have more LWPs/processes than CPUs, then it's going to happen
more often and faster.
..

That's up to the process and the thread model. If you're running with
solaris 8, try it under the new thread libraries. They'll be the
default in Solaris 9. If that doesn't change anything, then the process
itself is simply changing the number of executing threads.
...

Remember your 'ps' task is running at the instant that the process table is
captured... hence you will only see a maximum of 3 threads (or maybe even
less at any given moment depending on other processes running) of the other
app running on your 4 cpu box as shown by ps. There is the other thread in
'run' state meaning is is ready to run... but there is no free cpu at that
instant to run it.
..
As for your question about keeping a process on a cpu- since threads are
lightweight the effect of switching between cpus is far less than a whole
process. also dont forget that the scheduler will aim to keep a process on
a given cpu from timeslice to timeslice... but over a 'long' (ie: longer
than a few timeslice intervals) time processes will migrate around the
cpus... I wouldnt worry too much about it...
..
This behaviour is perfectly normal for "Light Weight Processes". In
Solaris, even your most basic, non-threaded application such as a shell,
sh, is a LWP. The whole "keep a process to a CPU" is really a x86
problem. UltraSPARC CPU's have what is called a "Snoop Cache". What
this means, is that each CPU can see the other's currently executing
code, and see individual threads, i.e. LWP's. The benefit of this, is
that the hardware can automatically move a LWP from CPU to CPU with very
little overhead. It also means the hardware can better keep itself
busy.

Your system probably has 4 CPU's. So, when you look at the running
processes, it's somewhat misleading. I would say the tools are fully
using each of the 4 CPU's, since your output shows that. The "run" just
means the current LWP in the run queue. So, in your outputs, you'll see
/12 using CPU3 and CPU2.

There are some really good books that explain this. The Sun Performance
Tuning Guide is an excellent book, though a bit dated that will explain
this. Also, if you get a chance, the SA400 Performance tuning course
from Sun course book goes into some detail as well.
..

My original question :

Hi,

I don't quite understand the order of execution on an SMP systems and
some issues of LWP
I have a multithreaded application ( 3rd party ) which i inspect and see

that each monitoring interval it runs on a different CPU although it's
known
that it's best to keep a process running on the same processor ( keep
"warm" cache )
Another thing is that the number of LWP for a given process is changing
from time to time during the process execution - ??

For example , under V880 with 4 processors , i have the following case :

in a specific minute :

prstat -L :

root@galileo219 > /bin/ps -elcL
...
...
...
...
 8 R 686 524 521 1 TS 50 ? 248522 pts/2
194:40 calibre6
 8 S 686 524 521 2 TS 58 ? 248522 ? pts/2
0:00 calibre6
 8 S 686 524 521 3 TS 58 ? 248522 ? pts/2
12:05 calibre6
 8 O 686 524 521 4 TS 50 ? 248522 pts/2
190:23 calibre6
 8 S 686 524 521 5 TS 58 ? 248522 ? pts/2
34:10 calibre6
 8 O 686 524 521 6 TS 0 ? 248522 pts/2
190:40 calibre6
 8 S 686 524 521 22 TS 58 ? 248522 ? pts/2
0:00 calibre6
 8 O 686 524 521 12 TS 50 ? 248522 pts/2
166:37 calibre6
 8 S 686 524 521 23 TS 58 ? 248522 ? pts/2
0:00 calibre6
 8 S 686 524 521 24 TS 58 ? 248522 ? pts/2
0:00 calibre6
 8 S 686 524 521 25 TS 58 ? 248522 ? pts/2
0:00 calibre6
 8 S 686 524 521 26 TS 58 ? 248522 ? pts/2
0:00 calibre6
 8 S 686 524 521 27 TS 58 ? 248522 ? pts/2
0:00 calibre6

13 LWP of the same process id ( 524 ).

prstat -L on 1 moment : ( paste of the relevent section )

   PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
   524 israelb 1994M 1976M run 10 0 2:48.54 22% calibre64/12
   524 israelb 1994M 1976M cpu2 10 0 3:16.56 22% calibre64/1
   524 israelb 1994M 1976M cpu0 10 0 3:12.38 22% calibre64/4
   524 israelb 1994M 1976M cpu1 20 0 3:12.57 22% calibre64/6
   524 israelb 1994M 1976M sleep 48 0 0:34.13 0.6% calibre64/5

After another moment :

  PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
   524 israelb 1994M 1976M cpu0 0 0 3:17.25 17% calibre64/1
   524 israelb 1994M 1976M cpu1 0 0 3:13.06 17% calibre64/4
   524 israelb 1994M 1976M run 0 0 2:49.22 17% calibre64/12
   524 israelb 1994M 1976M cpu3 0 0 3:13.25 17% calibre64/6
   524 israelb 1994M 1976M sleep 30 0 0:34.19 6.2% calibre64/5

We see that for example LWP 1 was running on cpu2 and on the other
moment it's running on CPU0

Another thing is that on a specific moment i see only 3 processes
running and not 4 ( 3 running and 1 in the run state waiting to run ) -
?

Thanks in advance for your answer.
I will summarize.

-- 
Zeev Fisher - Unix System Administrator
Galileo Technology Ltd - A Marvell Company
Moshav Manof, D.N. Misgav 20184, ISRAEL
Email    -  zeevf@galileo.co.il
Tel      -  + 972 4 8225046 ext. 1402
Cell     -  + 972 54 995402
Fax      -  + 972 4 8326420
WWW Page:     http://www.marvell.com
------------------------------------------------------------------------
This message may contain confidential, proprietary or legally privileged
information. The information is intended only for the use of the individual
or entity named above. If the reader of this message is not the
intended recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify us
immediately by telephone, or by e-mail and delete the message from your
computer.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:24:33 EDT