Monday, February 18, 2008

DTraceToolkit in MacOS X

Apple included DTrace in MacOS X 10.5 (Leopard), released in October 2007. It's great to have DTrace available in MacOS X for its powerful application and kernel performance analysis. To think that there is now another kernel we can examine using DTrace is exciting - it's like discovering a new planet in the solar system.

Apart from kernel analysis, DTrace also improves general usability by answering every day questions like: why are my disks rattling? or why does my browser keep hanging? Although, your average user may not write DTrace scripts to answer these questions themselves (it's better if they do), but instead use prewritten scripts.

MacOS X includes a collection of DTrace scripts in /usr/bin, mostly from the DTraceToolkit:
leopard# grep -l DTrace /usr/bin/*
/usr/bin/bitesize.d
/usr/bin/cpuwalk.d
/usr/bin/creatbyproc.d
/usr/bin/dappprof
/usr/bin/dapptrace
/usr/bin/diskhits
/usr/bin/dispqlen.d
/usr/bin/dtruss
/usr/bin/errinfo
/usr/bin/execsnoop
/usr/bin/fddist
/usr/bin/filebyproc.d
/usr/bin/hotspot.d
/usr/bin/httpdstat.d
/usr/bin/iofile.d
/usr/bin/iofileb.d
/usr/bin/iopattern
/usr/bin/iopending
/usr/bin/iosnoop
/usr/bin/iotop
/usr/bin/kill.d
/usr/bin/lastwords
/usr/bin/loads.d
/usr/bin/newproc.d
/usr/bin/opensnoop
/usr/bin/pathopens.d
/usr/bin/pidpersec.d
/usr/bin/plockstat
/usr/bin/priclass.d
/usr/bin/pridist.d
/usr/bin/procsystime
/usr/bin/runocc.d
/usr/bin/rwbypid.d
/usr/bin/rwbytype.d
/usr/bin/rwsnoop
/usr/bin/sampleproc
/usr/bin/seeksize.d
/usr/bin/setuids.d
/usr/bin/sigdist.d
/usr/bin/syscallbypid.d
/usr/bin/syscallbyproc.d
/usr/bin/syscallbysysc.d
/usr/bin/topsyscall
/usr/bin/topsysproc
/usr/bin/weblatency.d

That's 44 DTraceToolkit scripts, plus plockstat from Solaris 10. While the DTraceToolkit now has over 200 scripts, it makes sense to pick out the most useful scripts for inclusion in /usr/bin.

Popular scripts such as iosnoop can now be run by MacOS X users:

leopard# iosnoop
UID PID D BLOCK SIZE COMM PATHNAME
501 130 R 31987472 40960 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 7879952 8192 Terminal ??/SearchManager.nib/keyedobjects.nib
501 130 R 32132304 12288 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32132528 4096 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32047696 12288 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32132592 4096 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32131512 12288 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32033296 12288 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32044488 4096 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32045064 4096 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32131344 4096 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32048680 16384 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32132544 8192 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32049296 12288 Terminal ??/dyld/dyld_shared_cache_i386
-1 0 W 32482848 86016 kernel_task ??/vm/swapfile2
-1 0 W 32483040 135168 kernel_task ??/vm/swapfile2
501 130 R 32044672 4096 Terminal ??/dyld/dyld_shared_cache_i386
501 130 R 32132656 12288 Terminal ??/dyld/dyld_shared_cache_i386
[...]

The man pages are conveniently included in /usr/share/man.

I had been making preperations in the latest DTraceToolkit (0.99) for MacOS X DTrace, such as putting an "OS" field into the man pages and figuring out how to support different versions of the same script (tcpsnoop_snv, etc). Hopefully many scripts will run on both Solaris and MacOS X (especially if they use stable providers), however I expect there will be some that are specific to each. Now that QNX DTrace also exists, there is additional need for identifying OS specifics in the DTraceToolkit.

It's been great news for DTrace, Sun and Apple - who have not only gained the best performance and debugging tool available, but also the existing DTrace community.

Sunday, February 17, 2008

Browsable DTraceToolkit

Stefan Parvu has created browsable HTML versions of the DTraceToolkit on the DTT test page. See DTraceToolkit ver 0.99 to browse that version.

A goal of the DTraceToolkit is to provide documented examples of DTrace scripting, in addition to what is available in the DTrace Guide. However these examples have been reaching a limited audience of those who download, unzip, and browse through the text files.

Now that the DTraceToolkit is browsable online, its contents can be found by internet search engines. This should help people not only find examples of DTrace usage, but also solutions to some common observability problems.

There have been a few other items of DTraceToolkit news which I'll blog about soon. Please excuse my infrequent blog postings - I've been busy since joining Sun on a particular project which consumes most of my spare time. It will be worth it, which should be clear once I can start posting about it on my Sun blog.

Friday, October 05, 2007

DTraceToolkit ver 0.99

I've released DTraceToolkit ver 0.99 - a major release. If you haven't encountered it before, the DTraceToolkit is a collection of opensource scripts for:
  • solving various troubleshooting and performance issues
  • demonstrating what is possible with DTrace (seeing is believing)
  • learing DTrace by example

The DTraceToolkit isn't Sun's standard set of Solaris performance tools, or, everything that DTrace can do. It is a handy collection of documented and tested DTrace tools that are useful in most situations. It's certainly not a substitute for learning DTrace properly and writing your own custom scripts - although it should help you do that by providing working examples of code. It is certainly better than not using DTrace at all, if you didn't have the time to learn DTrace.

DTraceToolkit 0.96

18 months ago I released version 0.96 of the DTraceToolkit. The Bin directory, which contains symlinks to all the scripts, looked like this,


DTraceToolkit version 0.96

Click for a larger image. That collection of scripts provides various views of system behaviour and of the interaction of applications with the system. A general theme, if there was one, was that of easy system observability - illuminating system internals that were either difficult or impossible to see before. There were 104 scripts in that release - although that is just the tip of an iceberg.

DTraceToolkit 0.99

In the last 18 months, DTrace has been expanding its horizons through the creation of numerous new DTrace providers. There are now providers for Java, JavaScript, Perl, Python, Php, Ruby, Shell and Tcl, and more on the way. The iceberg of system observability is now just one in a field of icebergs, with more rising above the surface as time goes by. It's about time the DTraceToolkit was updated to refelect the trajectory of DTrace itself - which isn't just about system observability - it is the observability of life, the universe, and everything.

This new version of the DTraceToolkit does has a theme - that of programing language visibility, which covers several of the new DTrace providers that now exist. The Bin directory of the toolkit now contains 230 scripts, and looks like,


Below I've highlighted most of the new contents, grouped by language that they trace,


DTraceToolkit version 0.99

I've placed language scripts in their own subdirectories, each of which include a "Readme" file to document suggestions for where to find the provider and how to get it installed and working. Some of these providers do require downloading of source, patching, and compiling, which may take you some time. Both C and C++ are supported by numerous scripts, but don't have a prefix to group them like the other languages do; I hope to fix this in the next release, and actually have C and C++ subdirectories.

DTraceToolkit 0.99 change summary:
  • New script categories: Java, JavaScript, Perl, Php, Python, Ruby, Shell and Tcl
  • Many new scripts (script count has doubled)
  • Many new example files (since there is one per script)
  • A new Code directory for sample target code (used by the example files)
  • Several bug fixes, numerous updates
  • Updated versions of tcpsnoop/tcptop for OpenSolaris/Solaris Nevada circa late 2007

For each language, there is about a dozen scripts to provide:
  • general observability (such as function and object counts, function flows)
  • performance observability (on-cpu/elapsed times, function inclusive/exclusive times, memory allocation)
  • some paths for deeper analysis (syscall and library tracing, stacks, etc)

Screenshots

Now for some examples. I'll demonstrate PHP, but the following applies to all the new languages supported in the toolkit (look in the Example directory for the language that interests you). Here is a sample PHP program that appears in the toolkit as Code/Php/func_abc.php,
/opt/DTT> cat -n Code/Php/func_abc.php 
1 <?php
2 function func_c()
3 {
4 echo "Function C\n";
5 sleep(1);
6 }
7
8 function func_b()
9 {
10 echo "Function B\n";
11 sleep(1);
12 func_c();
13 }
14
15 function func_a()
16 {
17 echo "Function A\n";
18 sleep(1);
19 func_b();
20 }
21
22 func_a();
23 ?>

For general observability there are scripts such as php_funccalls.php, for counting function calls,
# php_funccalls.d 
Tracing... Hit Ctrl-C to end.
^C
FILE FUNC CALLS
func_abc.php func_a 1
func_abc.php func_b 1
func_abc.php func_c 1
func_abc.php sleep 3

The above output showed that sleep() was called three times, and the user defined functions were each called once - as we would have assumed from the source. With DTrace you can check your assumptions through tracing, which is often a useful excercise. I've solved many performance issues by examining areas that seem almost too obvious to bother checking.

The following script is both for general and performance observability. It prints function flow along with delta times,
# php_flowtime.d
C TIME(us) FILE DELTA(us) -- FUNC
0 3646108339057 func_abc.php 9 -> func_a
0 3646108339090 func_abc.php 32 -> sleep
0 3646109341043 func_abc.php 1001953 <- sleep
0 3646109341074 func_abc.php 31 -> func_b
0 3646109341098 func_abc.php 23 -> sleep
0 3646110350712 func_abc.php 1009614 <- sleep
0 3646110350745 func_abc.php 32 -> func_c
0 3646110350768 func_abc.php 23 -> sleep
0 3646111362323 func_abc.php 1011554 <- sleep
0 3646111362351 func_abc.php 27 <- func_c
0 3646111362361 func_abc.php 10 <- func_b
0 3646111362370 func_abc.php 9 <- func_a
^C

Delta times, in this case, show the time from the line on which they appear to the previous line. The large delta times visible show that from when the sleep() function completed to when it began, took around 1.0 seconds of elapsed time. As we'd expect from the source. Both CPU (C) and time since boot (TIME(us)) are printed in case the output is shuffled on multi-CPU systems, and requires post sorting.

A more consise way to examine elapsed times may be php_calltime.d, a script that can be useful for performance analysis,
# php_calltime.d
Tracing... Hit Ctrl-C to end.
^C

Count,
FILE TYPE NAME COUNT
func_abc.php func func_a 1
func_abc.php func func_b 1
func_abc.php func func_c 1
func_abc.php func sleep 3
- total - 6

Exclusive function elapsed times (us),
FILE TYPE NAME TOTAL
func_abc.php func func_c 330
func_abc.php func func_b 367
func_abc.php func func_a 418
func_abc.php func sleep 3025644
- total - 3026761

Inclusive function elapsed times (us),
FILE TYPE NAME TOTAL
func_abc.php func func_c 1010119
func_abc.php func func_b 2020118
func_abc.php func sleep 3025644
func_abc.php func func_a 3026761

The exclusive function elapsed times show which functions cused the latency - which identifies sleep() clocking a total of 3.0 seconds (as it was called three times). The inclusive times show which higher level functions the latency occured in, the largest being func_a() since every other function were called within it. There isn't a total line printed for inclusive times - since it wouldn't make much sense to do so (that column sums to 9 seconds, which is more confusing than of actual use).

As a starting point for deeper analysis, the following script adds syscall tracing to the PHP function flows, and adds terminal escape colours (see Notes/ALLcolors_notes.txt for notes on using colors with DTrace),
# php_syscolors.d
C PID/TID DELTA(us) FILE:LINE TYPE -- NAME
0 18426/1 8 func_abc.php:22 func -> func_a
0 18426/1 41 func_abc.php:18 func -> sleep
0 18426/1 15 ":- syscall -> nanosleep
0 18426/1 1008700 ":- syscall <- nanosleep
0 18426/1 30 func_abc.php:18 func <- sleep
0 18426/1 42 func_abc.php:19 func -> func_b
0 18426/1 28 func_abc.php:11 func -> sleep
0 18426/1 14 ":- syscall -> nanosleep
0 18426/1 1010083 ":- syscall <- nanosleep
0 18426/1 29 func_abc.php:11 func <- sleep
0 18426/1 43 func_abc.php:12 func -> func_c
0 18426/1 28 func_abc.php:5 func -> sleep
0 18426/1 14 ":- syscall -> nanosleep
0 18426/1 1009794 ":- syscall <- nanosleep
0 18426/1 28 func_abc.php:5 func <- sleep
0 18426/1 34 func_abc.php:6 func <- func_c
0 18426/1 18 func_abc.php:13 func <- func_b
0 18426/1 17 func_abc.php:20 func <- func_a
0 18426/1 21 ":- syscall -> fchdir
0 18426/1 19 ":- syscall <- fchdir
0 18426/1 9 ":- syscall -> close
0 18426/1 13 ":- syscall <- close
0 18426/1 35 ":- syscall -> semsys
0 18426/1 12 ":- syscall <- semsys
0 18426/1 7 ":- syscall -> semsys
0 18426/1 7 ":- syscall <- semsys
0 18426/1 66 ":- syscall -> setitimer
0 18426/1 8 ":- syscall <- setitimer
0 18426/1 39 ":- syscall -> read
0 18426/1 14 ":- syscall <- read
0 18426/1 11 ":- syscall -> writev
0 18426/1 22 ":- syscall <- writev
0 18426/1 23 ":- syscall -> write
0 18426/1 110 ":- syscall <- write
0 18426/1 61 ":- syscall -> pollsys

From that output we can see that PHP's sleep() is implemented by the syscall nanosleep(), and that it appears that writes are buffered and written later.

Understanding the language scripts

The scripts are potentially confusing for a couple of reasons; understanding these will help explain how best to use these scripts.

Firstly, some of the scripts print out an event "TYPE" field (eg, "syscall" or "func"), and yet the script only traces one type of event (eg, "func"). Why print a field that will always contain the same data? Sounds like it might be a copy-n-paste error. This is in fact deliberate so that the scripts form better templates for expansion, and was something I learnt while using the scripts to analyse a variety of problems. I would frequently have a script that measured functions and wanted to add syscalls, library calls or kernel events, to customise that script for a particular problem. When adding these other event types, it was much easier (often trivial) if the script already had a "TYPE" field. So now most of them do.

Also, there are variants of scripts that provide very similar tracing at different depths of detail. Why not one script that does everything, with the most detail? Here I'll use the Tcl provider to explain why,

The tcl_proccalls.d script counts procedure calls, here it is tracing the Code/Tcl/func_slow.tcl program in the toolkit, which calls loops within functions:
# tcl_proccalls.d
Tracing... Hit Ctrl-C to end.
^C
PID COUNT PROCEDURE
183083 1 func_a
183083 1 func_b
183083 1 func_c
183083 1 tclInit

Ok, that's a very high level view of Tcl activity. There is also tcl_calls.d to count Tcl commands as well as procedures:
# tcl_calls.d
Tracing... Hit Ctrl-C to end.
^C
PID TYPE NAME COUNT
[...]
183086 proc func_a 1
183086 proc func_b 1
183086 proc func_c 1
183086 proc tclInit 1
[...]
183086 cmd if 8
183086 cmd info 11
183086 cmd file 12
183086 cmd proc 12

(Output truncated). Now we have a deeper view of Tcl operation. Finally there is tcl_ins.d to count Tcl instructions:
# tcl_ins.d
Tracing... Hit Ctrl-C to end.
^C
PID TYPE NAME COUNT
[...]
16005 inst jump1 14
16005 inst pop 18
16005 inst invokeStk1 53
16005 inst add 600000
16005 inst concat1 600000
16005 inst exprStk 600000
16005 inst lt 600007
16005 inst storeScalar1 600016
16005 inst done 600021
16005 inst loadScalar1 1200020
16005 inst push1 4200193

There are such large counts since this script is tracing the low level instructions used to execute the loops in this code.

So the question is: why three scripts and not one that counts everything? Why not tcl_megacount.d? There are at least three reasons behind this.

1. You don't pull out a microscope if you are looking for your umbrella. Tracing everything can really bog you down in detail, when higher level views are frequently sufficient for solving issues. It's a matter of the right tool for the job. Try the higher level scripts first, then dig deeper as needed.

2. Measuring at such a low level can have a significant performance impact. I'll demonstrate this by using ptime as a easy (but rough) measurement of run time for the Code/Tcl/func_slow.tcl sample program.

Here is func_slow.tcl running with no DTrace enabled as a baseline:
# ptime ./func_slow.tcl > /dev/null

real 3.306
user 3.276
sys 0.005

Now while tcl_proccalls.d is tracing procedure calls:
# ptime ./func_slow.tcl > /dev/null

real 3.311
user 3.270
sys 0.006

Now while tcl_calls.d is tracing both Tcl procedures and commands:
#  ptime ./func_slow.tcl > /dev/null

real 3.313
user 3.283
sys 0.006

Finally with tcl_ins.d tracing Tcl instructions:
# ptime ./func_slow.tcl > /dev/null

real 20.438
user 20.278
sys 0.010

Ow! Our target application is running 7 times slower? This isn't supposed to happen, is there something wrong with DTrace or the Tcl provider? Well, there isn't, the problem is that of tracing 9 million events during a 3 second program. The overheads of DTrace are miniscule, however if you multiply them (or anything) by ridiculously large numbers they'll eventually become measurable. 9 million events is a lot of events - and you wouldn't normally care about tracing Tcl instructions anyway. It's great to have this capability, if it is ever needed.

3. Example programs. Writing mega powerful DTrace scripts often means mega confusing scripts to read. People use the DTraceToolkit as a collection of example code to learn from, so keeping things simple if possible, should help.

Documentation

It usually takes more time to document a script than it takes to write it. I feel strongly about creating quality IT documentation, so I'm reluctant to skip this step. The strategy used in the toolkit is, by subdirectory name,
  • Scripts - these have a standard header plus inline comments.
  • Examples - this is a directory of example files that demonstrate each script, providing screenshots and interpretations of the output. Particular attention has been made to explain relevant details, especially caveats, no matter how obvious they may seem.
  • Notes - this is a directory of files that comment on a variety of topics that are relevant to multiple scripts. There are many more notes files in this toolkit release.
  • Man pages - there is a directory for these, which are useful for looking up field definitions and units. To be honest, I frequently use the Examples directory for reminders on what each tool does rather than the Man pages.

Since there were over 100 new scripts in this release, over 100 new example files needed to be written. Fortunately I had help from Claire Black (my wife), who as an ex-SysAdmin and Solaris Instructor had an excellent background for what people needed to know and how to express it. Unfortunately for her, she only had about 4 days from when I completed these scripts to when the documentation was due for inclusion on the next OpenSolaris Starter Kit disc. Despite the time constraint she has done a great job, and there is always the next release for her to spend more time fleshing out the examples further. I'm also aiming to collect more real-world example outputs for the next release (eg, Mediawiki analysis for the PHP scripts).

Testing

Stefan Parvu has helped out again by running every script on a variety of platforms, which is especially important now that I don't have a SPARC server at the moment. He has posted the results on the DTraceToolkit Testing Room, and did find a few bugs which I fixed before release.

Personal

It has been about 18 months since the last release. In case anyone was wondering, there are several reasons why, briefly: I moved to the US, began a new job in Sun engineering, left most of my development servers behind in Australia, brought the DTraceToolkit dev environment over on a SPARC UFS harddisk (which I still can't read), and got hung up on what to do with the fragile tcp* scripts. The main reason would be having less spare time as I learnt the ways of Sun engineering.

I also have a Sun blog now, here. I'm not announcing the DTraceToolkit there since it isn't a Sun project; it's a spare time project by myself and others of the OpenSolaris community.

Future

The theme for the next release may be storage: I'd like to spend some time seeing what can be done for ZFS, for iSCSI (using the new provider), and NFS.

Thursday, December 14, 2006

Updates from California

Since I've been quiet for a while on this blog and on OpenSolaris mailing lists, here is a quick update about myself and my projects.

I got married, ditched my old car, left the instructing/consulting scene in Australia/APAC, and moved to California to work as a developer at Sun. I brought along whatever I could fit in airline luggage, which included a laptop, hard disks, some books, clothes, and my fencing gear. Since then, I've been busy getting up to speed with some interesting projects, some of which I've discussed on my Sun blog.

OpenSolaris projects such as the DTraceToolkit will remain OpenSolaris projects; and other open source projects such as Chaosreader will of course remain open source. I'll post updates about them to this blog and will continue working on them as I always have done - as a spare time hobby.

Progress on these other projects has been on hold since I left Australia and powered down my array of test and development servers (and sadly powered down "mars", my main server which had an uptime of over 400 days). I began rebuilding my development environment last night, when I powered on the disk containing the DTraceToolkit build files for the first time in three months. I'm looking forward to adding some more scripts, and more documentation to guide people through the toolkit.

And finally, living in the US has been great - however the following items were quite difficult to find:
  1. cricket coverage
  2. authentic ginger beer
I can now listen to Star 101.3 again, but I've lost access to both the ABC and BBC's online radio coverage of the Ashes Cricket series - which seems to be blocked from the US!

Sunday, August 13, 2006

No more Star 101.3

If you are in Australia, that is.

Sometime last year I went looking for a decent Internet radio station, as I'm often online on my laptop, but not ofter near radios or reliable reception. I found Star 101.3 FM, and it was good. It happens to be based in the US.

My day would often begin with info bytes from John Tesh followed by hours of music (ok, sometimes they play the same music ;-). When I'd hear "Don Bleu in the morning", I'd know I'd stayed up a bit too late!

Recently Star 101.3 pulled the plug for international listeners "due to licensing restrictions". Ok, I can understand that may well be the case, and I'm glad it lasted as long as it did. If you are in the US you can still listen.

Brendan Clones #2

Several months ago several of the Brendan clones were photographed while working on the DTraceToolkit.

More recently they were photographed again, this time working on Solaris Internals and Solaris Performance and Tools. For the first time all of the Brendan clones were involved on the one project, with most of them pictured here. Click for a larger photo:



Sleeping Brendan is busy sleeping, Engineer Brendan is studying Solaris Internals 1st edition, and Master Brendan is looking on. The remaining Brendans are meticulously checking facts, reading source code, writing DTrace scripts and testing them.

Of course, photographer Brendan is behind the camera, taking the photo.

Friday, August 11, 2006

How much CPU does a down interface chew?

While teaching a DTrace class in Sydney this week I ran an ad-lib demo that took a mysterous and suprising turn. I think you'll find this quite interesting.

One of my favourite DTrace demos begins by asking how much CPU a busy network interface will consume. About 2 in 10 people will know the answer - a lot! But how did you measure this on older Solaris? After all, if it really is "a lot", then we really want a value! 20% of a CPU? 50%?

I was demonstrating on an Ultra 10, with a 333 MHz CPU and a 100 Mbit/sec interface (hme0). Even with the numerous TCP/IP performance enhancements in Solaris 10, driving 100 Mbit/sec on an Ultra 10 will still hurt.

From another host, I sent a large file over the network to drive the interface at capacity. I thought I'd start by showing how busy the network interface was by using the sysperfstat tool from the K9Toolkit (I usually use nicstat, I thought this would be a nice change). sysperfstat ran for a few seconds, then I began the bulk network transfer:
$ /brendan/K9Toolkit/sysperfstat 1
Time %CPU %Mem %Disk %Net CPU Mem Disk Net
19:24:27 5.24 33.52 0.52 0.07 0.05 0.00 0.01 0.00
19:24:28 7.35 33.53 1.29 0.00 0.00 0.00 0.00 0.00
19:24:29 7.48 33.53 0.00 0.00 0.00 0.00 0.00 0.00
19:24:30 39.65 33.79 1.27 51.27 3.69 0.00 0.00 0.00
19:24:32 65.67 33.79 0.00 173.14 1.28 0.00 0.00 0.00
19:24:34 68.35 33.79 0.00 173.03 0.00 0.00 0.00 0.00
19:24:37 80.12 33.79 0.00 176.33 0.82 0.00 0.00 0.00
19:24:39 76.97 33.79 0.00 178.12 0.43 0.00 0.00 0.00
19:24:41 76.08 33.79 1.61 164.82 1.37 0.00 0.00 0.00
19:24:44 80.18 33.79 0.00 177.68 0.38 0.00 0.00 0.00

Oh dear! I was expecting %Net to reach around 90%, not over 170%!

Both the Net and Disk percentages are in terms of a single interface or disk, as from experience this has shown to be more useful than a global percentage (which CPU and Mem use). 170% could indicate that I have two network interfaces running at around 85%. But why? I had only initiated one network transfer to the hme0 address.

... The system had become so heavily loaded that sysperfstat's interval of 1 second became 2 and sometimes 3 seconds. Good thing I never assumed sleep(1) would actually sleep for 1 second only - sysperfstat uses delta times from kstat measurements.

Demonstrations are much more exciting when even you don't know of the outcome. So why did we hit 170% utilisation? Maybe another network interface was doing traffic that I didn't know about:
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.1.13 netmask ffffff00 broadcast 192.168.1.255
ether 8:0:20:b0:7a:18
qfe0: flags=1000862<BROADCAST,NOTRAILERS,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 10.1.2.3 netmask ff000000 broadcast 10.255.255.255
ether 8:0:20:b0:7a:18
#

No answers there. hme0 is "UP", qfe0 is both down and has an address class that we aren't using anywhere at this site. So qfe0 can't be doing anything, right?
$ /brendan/K9Toolkit/nicstat 1
Time Int rKb/s wKb/s rPk/s wPk/s rAvs wAvs %Util Sat
19:25:34 qfe0 3.77 0.00 2.98 0.00 1292.8 42.00 0.03 0.00
19:25:34 hme0 4.69 0.42 3.77 5.92 1272.9 72.52 0.04 0.00
Time Int rKb/s wKb/s rPk/s wPk/s rAvs wAvs %Util Sat
19:25:36 qfe0 10435.6 0.00 8642.5 0.00 1236.4 0.00 85.49 0.00
19:25:36 hme0 10394.1 818.6 8601.4 15524.7 1237.4 54.00 91.86 0.00
Time Int rKb/s wKb/s rPk/s wPk/s rAvs wAvs %Util Sat
19:25:39 qfe0 10469.2 0.00 8744.1 0.00 1226.0 0.00 85.76 0.00
19:25:39 hme0 10451.4 828.3 8732.5 15708.5 1225.5 54.00 92.40 0.00

Woah - this "down" qfe0 with a dummy address is 85% busy - somthing really strange is going on!

I had began this demo to show %CPU for busy interfaces, which leads us to run the the DTrace based "intrstat" tool - new to Solaris 10. This shows us how much CPU our network interface interrupts are consuming, and now I had a renewed interest in what it had to say:
# intrstat 1

device | cpu0 %tim
-------------+---------------
hme#0 | 1232 32.8
m64#0 | 0 0.0
qfe#0 | 1140 37.8
su#0 | 0 0.0
su#1 | 0 0.0
uata#0 | 0 0.0

device | cpu0 %tim
-------------+---------------
hme#0 | 1233 32.9
m64#0 | 0 0.0
qfe#0 | 1138 37.8
su#0 | 0 0.0
su#1 | 0 0.0
uata#0 | 1 0.0
...

hme0 is consuming 33% of CPU0, quite a substantial amount - considering this is just a 100 Mbit/sec interface. That's where the planned demo would have ended.

However qfe0 is consuming 37.8% of CPU0 - this is really frightening! Remember, I am sending traffic to hme0's address, and qfe0 is not marked "UP". This is a dead interface with a dummy address - and it is consuming a LOT of CPU! (perhaps it is talking to other dead interfaces? ;-)

I began thinking outside the box, and Scott was around to help. We found that the room was connected to a hub, both hme0 and qfe0 were connected to this same hub, and they both had the same ethernet address. So while qfe0 eventually rejected the packets that it could see, it had to walk through much kernel code to get to that point - consuming CPU cycles and incrementing kstats. (a clue from the nicstat output was that qfe0 was receiving but not sending - so it wasn't taking part in a connection).

I unplugged qfe0 and the problem vanished. Easy. However associating CPU load with a down interface is far from easy!

...

As an aside, I also ask how we measure how utilised the network interfaces are. Very few people have been able to answer this - and it is a topic I hope chapter 7 of this book will increase awareness for. (eg, nicstat or sysperfstat from the K9Toolkit).