From: Tobias Oetiker (oetiker@ee.ethz.ch)
Date: Tue Jan 13 2004 - 02:38:00 EST
Summary: How to figure what my Solaris Kernel does
Usual Suspects
--------------
* It is serving NFS ... this can use a lot of CPU. Make sure you
are running version 3.
* A fast (Gigabit) interface can almost fill a cpu if it is busy
* It is swapping. If the kernel runs out of memory it will spend most of its
time moving pages back and forth between disk and ram.
- run "vmstat 5" the sr (scan rate) column should be very low (<100) this
means the system is not scanning for free memory pages
- It may make sense to have a lot of swap space configured, as Solaris
does conservative memory allocation. When a process forks it
will immediately allocate all the memory necessary even though
it does not use it. Solaris does "copy on write" so why not
have this extra memory allocated in swap instead of real ram,
assuming it is never going to be used anyway. (correct me if I
am wrong here.)
* It is forking ... this does not have to be a real fork bomb, but just some
process quitting and being restarted immediately. Pidentd running
non multi-threaded may be such a software. Some cgi process could
also be it. This is detectable by looking at the 'last process
id' with a tool like top.
* It is running veritas volume manager and a disk has failed.
Useful Tools
------------
* lockstat
lockstat -gkIW sleep 60
gives a 60 second profile of the kernel
* iftop
http://www.ex-parrot.com/~pdw/iftop
will show which box is sending how much traffic through your interface
* se toolkit
www.setoolkit.com
virtual adrian may be able to give some hints onto where the performance
issues lie
* prstat
prstat -m
will show user vs system time for each process, so if it is a process
causing the problem it should show here
* truss
truss -c -p PID
can help to identify which system calls a problematic process is spending
its time on. A summary is printerd on ctrl-c
* iostat
iostat -xnP 30 30
shows where the system is writing and reading data and how much
* vmstat
vmstat 5
shows paging activity (check the sr column)
* kstat
Displays kernel statistics. Did not get any useful hints on what could be
discovered here ... but sure gives a lot of numbers
* prex
prex -k
Part of the solaris tracing architecture. Note, that this will just open
a shell where you are expected to enter commands to activate the tracing. I got
the following example ... (reading the output is another issue)
# prex -k 1)
Type "help" for help ...
prex> buffer alloc 10m 2)
Buffer of size 10485760 bytes allocated
prex> enable $all 3)
prex> trace $all 4)
prex> ktrace on 5)
... wait a bit ...
prex> ktrace off
prex> untrace $all
prex> disable $all
prex> quit
# tnfxtract ./tnf.result 6)
# prex -k
Type "help" for help ...
prex> buffer dealloc 7)
prex> quit
# tnfdump ./tnf.result 8)
1) Issue prex command with kernel trace mode
2) You should allocate kernel in-core buffer to trace kernel activity.
3) Enable trace set named $all. You can specify your own trace facility
(tnf_name) set. (ie. all I/O operation) Refer prex man page.
4) Trace $all set.
5) Start kernel trace. Immediately kernel starts to collect tnf_probe and
store it kernel in-core buffer.
6) Extract contents of kernel buffer to file system.
7) Deallocate kernel in-core buffer. You should extract contents of buffer
before deallocate buffer. Contents of buffer will be erased immediately
when you issue "deallocate"
8) Convert raw tnf data to readable ASCII format.
Reading List
------------
Sun Performance and Tuning: Java and Internet, 2nd Edition (Adrian Cockcroft)
http://www.booksmatter.com/b0130952494.htm
Unlocking the kernel
http://www.sun.com/sun-on-net/itworld/UIR980801perf.html
Performance and Tuning on the Solaris 2.6, 7, and 8
http://developers.sun.com/solaris/articles/tuning_solaris.html
Contributors
------------
Markus Kluge, Ramiro Santos, Allen Wooden, przemol, Casper Dik, Jon Andrews,
Thomas 'Mike' Michlmayr, Amiel Lee Yee, William Hathaway, Jeff Vaneek, Frank Smith,
Darren Dunham, Jon Andrews, Darren Dunham, Luc I. Suryo, Joe Fletcher, Mark Pfeiffer,
Joohyun Cha, Karl Vogel, Todd M. Wilkinson.
Yesterday Tobias Oetiker wrote:
> Folks,
>
> We have this 4 Way Sun Enterprise 420R server. With 4GB Ram and
> about 10GB swap. It runs a ton of services (Apache, Postfix,
> Amavis, Spamassassin) and it also acts as a NFS server.
>
> Lately we are experiencing performance issues ... the box goes to
> load 17 and responds rather sluggishly.
> When looking at the load we often see the following picture:
>
> 50% User
> 50% Kernel
> 0% Idle
>
> The 50% User is easy to attribute by looking at the processes. But
> what is the system doing in the 50% kernel time?
>
> Is there something like kernel-top? I played around with lockstat
> a bit, but it did not really answer my questions ...
>
> We are running Solaris 8.
>
> cheers
> tobi
>
-- ______ __ _ /_ __/_ / / (_) Oetiker @ ISG.EE, ETZ J97, ETH, CH-8092 Zurich / // _ \/ _ \/ / System Manager, Time Lord, Coder, Designer, Coach /_/ \.__/_.__/_/ http://people.ee.ethz.ch/~oetiker +41(0)1-632-5286 _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers
This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:49 EDT