10 December, 2005

Solaris Performance Metrics

I've uploaded the first paper in a series I'm writing on Solaris performance monitoring metrics, it covers "disk utilisation by process" (AKA "utilization"). I've also created a website where the paper can be downloaded, and a cover photo,



... Which readers of Sun Performance and Tuning may find familiar. :-)

While writing freeware such as the DTraceToolkit, I'm often asked to explain the performance metrics I've choosen in greater detail. These papers will serve as a reference that I can point people to.

I began writing these over a year ago, before I had access to the kernel code. I liked to believe that it wasn't such a problem, yet with hindsight it was. The OpenSolaris project and being able to read through kernel code, has really made writing these documents possible!

04 December, 2005

Advanced DTrace Video

To see Bryan Cantrill speak about various advanced DTrace topics, see the recent SOSUG (Sydney OpenSolaris User Group) videos that Alan organised to have put online. You'll also find a ZFS presentation by James.

In other news, I just uploaded the DTraceToolkit version 0.89, which has had a few changes to the way TCP is measured.

30 November, 2005

DTraceToolkit 0.88

I've just uploaded the latest version of the DTraceToolkit, version 0.88 (88 scripts). I've updated the OpenSolaris DTraceToolkit site to point to the new version. This version has many updated scripts and a few new ones.

Between 0.80 and 1.00 I'll be doing more work revisiting code and retesting code rather than adding scripts.
Homepage

Let me explain what is going on with my homepage URL for future reference.

My homepage is at http://www.brendangregg.com. It's a DNS pointer that points to wherever my homepage actually lives, which may well change - however my name will not!

And yes, it is likely that my actual homepage may move at some point. Some people have experienced DNS problems with the current location (users.tpg.com.au), and I've just uploaded a new DTraceToolkit and have almost run out of space!,
1070 files used (10%) - authorized: 10000 files
30688 Kbytes used (99%) - authorized: 30720 Kb

If you've linked to www.brendangregg.com, then no problem - it will always point to the right place (which is why I have the thing - I have been through a painful website move in the past).

29 November, 2005

Sys Admin Magazine

The December, 2005 copy contains an article on the DTraceToolkit written by Ryan Matteson. Grab a copy! The article is "Observing I/O Behavior with the DTraceToolkit", and is quite good. It was also selected as the feature article - which means it will be available online for some time,

http://www.samag.com/documents/sam0512a/

Thanks Matty, and Sys Admin Magazine!

24 November, 2005

DTrace Translators

While teaching a DTrace class in Sydney, I've been asked about translators. They are quite useful, so I've prepared the following as a quick demo.

This is a DTrace program to trace the time() syscall, print the process, it's parent, it's grand-parent, and so on.
#!/usr/sbin/dtrace -s

/* Declare Translator */

typedef struct ancestory {
string me; /* my cmd */
string p; /* parent cmd */
string gp; /* grand-parent cmd */
string ggp; /* great-grand-parent cmd */
string gggp; /* great-great-grand-parent cmd */
} ancestory_t;

translator ancestory_t < struct _kthread *T > {

/* fetch my details */
me = T->t_procp->p_user.u_comm;

/* fetch anscestor details if they exist */
p = T->t_procp->p_parent != NULL ?
T->t_procp->p_parent->p_user.u_comm :
"<none>";
gp = T->t_procp->p_parent != NULL ?
T->t_procp->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_user.u_comm :
"<none>" : "<none>";
ggp = T->t_procp->p_parent != NULL ?
T->t_procp->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent->p_user.u_comm :
"<none>" : "<none>" : "<none>";
gggp = T->t_procp->p_parent != NULL ?
T->t_procp->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent->p_parent->p_user.u_comm :
"<none>" : "<none>" : "<none>" : "<none>";
};

inline ancestory_t *ancestors = xlate <ancestory_t *> (curthread);

/* Main Program */

syscall::gtime:entry
{
printf("%s, %s, %s, %s, %s", ancestors->me,
ancestors->p, ancestors->gp, ancestors->ggp, ancestors->gggp);
}

The main program at the end is quite consise, it prints the details from "ancestors". The translator has walked the p_parent pointers carefully, returning "<none>" if the pointer is NULL. ("ancestors->me" is unnecessary since we have "execname", I've included it as a simple demonstration).

The output is,

# ./transdemo.d
dtrace: script './transdemo.d' matched 1 probe
CPU ID FUNCTION:NAME
0 6615 gtime:entry bash, sh, bash, sshd, sshd
0 6615 gtime:entry date, bash, sh, bash, sshd
0 6615 gtime:entry bash, sh, bash, sshd, sshd
0 6615 gtime:entry nscd, init, sched, <none>, <none>
0 6615 gtime:entry nscd, init, sched, <none>, <none>
0 6615 gtime:entry nscd, init, sched, <none>, <none>
0 6615 gtime:entry nscd, init, sched, <none>, <none>
[...]

Without our careful pointer tests, the NULL parent pointers would have caused DTrace to print errors rather than our "<none>" keywords.

The "Declare Translator" section can be cut-n-pasted into a new .d file in /usr/lib/dtrace (eg, /usr/lib/dtrace/anscestors.d) where it will be automatically imported by every future DTrace script.

Take a look under /usr/lib/dtrace at the existing translator scripts, they are quite fascinating.

19 November, 2005

ZFS

The doors have been flung open for ZFS, Sun's "last word in filsystems". It's now in OpenSolaris and there is a ZFS Community page where you can find introductions, demonstrations, and advanced discussions. We don't know when ZFS will appear in Solaris 10, but for it to appear in OpenSolaris shows that the process has began.

ZFS raises the bar for filesystems to a new height, and even changes the way you think about filesystems. Let me provide a quick demo, although I don't have an array of spare disks handy - you'll need to pretend that each of these 1 Gb slices is actually a seperate disk,

# zpool create apps mirror c0t1d0s0 c0t1d0s1 mirror c0t1d0s3 c0t1d0s4


That's it - 1 command for a ZFS pool, that is both mirrored and dynamically striped (think RAID 1+0), 256 bit checksum'd, remounted on boot, and can be grown to a virtually unlimited size.

Lets run a few status commands to check it worked.

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
apps 1.98G 33.0K 1.98G 0% ONLINE -
#
# zpool status
pool: apps
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
apps ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t1d0s0 ONLINE 0 0 0
c0t1d0s1 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t1d0s3 ONLINE 0 0 0
c0t1d0s4 ONLINE 0 0 0
#
# df -h -F zfs
Filesystem size used avail capacity Mounted on
apps 2.0G 8K 2.0G 1% /apps


The size of 2 Gb is correct, and the "zpool status" command neatly prints the layout.

Now perhaps a slightly more realistic demo (although still no seperate disks, sorry). Rather than having all the disks combine to one filesystem, ZFS is really intended to combine disks into pools, and then have multiple filesystems share a pool. The following quick demo shows this,

# zpool create fast mirror c0t1d0s0 c0t1d0s1 mirror c0t1d0s3 c0t1d0s4
# zfs create fast/apps
# zfs create fast/oracle
# zfs create fast/home
# zfs set mountpoint=/export/home fast/home
# zfs set compression=on fast/home
# zfs set quota=500m fast/home
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
fast 91.0K 1.97G 9.5K /fast
fast/apps 8K 1.97G 8K /fast/apps
fast/home 8K 500M 8K /export/home
fast/oracle 8K 1.97G 8K /fast/oracle


Each filesystem may have different options set, such as quotas, reservations and compression. If a filesystem was running out of space, quotas can be changed live with a single command. If the pool was running out of space, disks can be added live with a single command.

As a programmer there are times when you encounter something that is so elegant and obvious that you are struck with a feeling that it is right. For a moment you can clearly see what the developer was thinking and that they achieved it perfectly. ZFS is one of those moments.

08 October, 2005

Network Monitoring

The network often gets blamed for performance issues in Solaris. How busy are your network interfaces? How do you find out?

Many sysadmins are using "netstat -i" and looking at packet counts. Packet counts turn out to be not as useful as they seem - you don't know if they are big packets or small packets, so you really don't how utilised the network interface is.

I've written an website on network monitoring, to cover how to discover what is really happening on your network. To do this I use a variety to tools, including some based on Kstat and DTrace.

26 September, 2005

mdb ::wumpus

I've added another program to the "specials" page - a site of, erm, quite special software indeed.

It's a loadable module for the Solaris Modular Debugger command. It provides a new dmcd "::wumpus". A screenshot should say all,

# mdb
> ::load wumpus
> ::wumpus
INSTRUCTIONS (Y-N)
?N
HUNT THE WUMPUS

I SMELL A WUMPUS!
BATS NEARBY!
YOU ARE IN ROOM 17
TUNNELS LEAD TO 7 16 18

SHOOT OR MOVE (S-M)
?

It uses the code from ESR's classic wumpus clone. (I checked with him to make sure he didn't mind wumpus being ported to such an unusual place).

If you haven't seen "Hunt the Wumpus" before, it's a classic text game originally written in BASIC.
DTraceToolkit 0.84

The latest version of the DTraceToolkit has been uploaded. Visit the OpenSolaris
DTraceToolkit site to download it.

Since version 0.82 I've updated several scripts and added a couple. Between now and version 1.00 there will be an emphasis on revisiting code, enhancing existing programs and writing more documentation.

27 August, 2005

APC

Australian Personal Computer magazine is Australia's leading computer magazine, with the first issue printed in 1980. It's especially popular with PC or Windows enthusiasts, or those who work in that industry. People in the Unix, Linux or Solaris community don't seem to read it as much.

Recently I wrote a workshop for the September issue that gave a detailed summary of configuring networking in Solaris 10. Networking has changed in Solaris 10, and it was great to cover what's new and how to use it (including SMF, ipnodes and IP Filter). I've spoken to a number of Solaris people about the article and most of them are suprised - APC is running articles on Solaris?

Yes, APC is running articles on Solaris! And Linux, and Unix in general.

APC July 2005 included Solaris 10 on the cover DVD, and a workshop to cover installing Solaris 10 and Windows, duel boot.

APC August 2005 had an article on the future of chip technologies, covering Sun's Niagara CPU.

APC September 2005 included a workshop on configuring networking on Solaris 10.

If you aren't reading it and you live in Australia (or can arrange mail subscription?) it may well be worth a look. http://www.apcmag.com

27 June, 2005

DExplorer

I've just uploaded dexplorer ver 0.75, and it's shaping up to be quite a useful tool.
It runs a series of DTrace scripts that monitor generic system activity, and saves the output in a .tar.gz file with a meaningful structure,

http://www.opensolaris.org/os/community/dtrace/dexplorer
http://www.brendangregg.com/dtrace.html#DExplorer

The idea came from David Visser, Christopher Wells, and several other guys I met while in Melbourne, Australia last week. It's a great idea - it makes a lot of sense.

Here I've expanded a .tar.gz file created by dexplorer to demo the contents,
# find de_jupiter_200506272230 -type f
de_jupiter_200506272230/Cpu/interrupt_by_cpu
de_jupiter_200506272230/Cpu/interrupt_time
de_jupiter_200506272230/Cpu/dispqlen_by_cpu
de_jupiter_200506272230/Cpu/sdt_count
de_jupiter_200506272230/Disk/pgpgin_by_process
de_jupiter_200506272230/Disk/fileopen_count
de_jupiter_200506272230/Disk/sizedist_by_process
de_jupiter_200506272230/Mem/minf_by_process
de_jupiter_200506272230/Mem/vminfo_by_process
de_jupiter_200506272230/Net/mib_data
de_jupiter_200506272230/Net/tcpw_by_process
de_jupiter_200506272230/Proc/sample_process
de_jupiter_200506272230/Proc/syscall_by_process
de_jupiter_200506272230/Proc/syscall_count
de_jupiter_200506272230/Proc/readb_by_process
de_jupiter_200506272230/Proc/writeb_by_process
de_jupiter_200506272230/Proc/sysinfo_by_process
de_jupiter_200506272230/Proc/newprocess_count
de_jupiter_200506272230/Proc/signal_count
de_jupiter_200506272230/Proc/syscall_errors
de_jupiter_200506272230/Info/uname-a
de_jupiter_200506272230/Info/psrinfo-v
de_jupiter_200506272230/Info/prtconf
de_jupiter_200506272230/Info/df-k
de_jupiter_200506272230/Info/ifconfig-a
de_jupiter_200506272230/Info/ps-o
de_jupiter_200506272230/Info/uptime
de_jupiter_200506272230/log

Lots of goodies to pick over.

The names of the files should indicate their contents. Many of them contain quite generic data, the idea is that one dexplorer file should contain as many statistics as possible.

Check for updates. I'll also throw it in the DTraceToolkit.

26 June, 2005

Created This

G'Day,

Back in 1996 I created a personal website to host various stuff, including artwork I had created using Povray. It also contained a message board called "The Wall", inspired by message boards I had been using on BBSes. People thought it was a dumb idea, some even horrified with the audacity of a personal website. I deleted it. Times have changed - personal websites are now more acceptable, and message boards or "blogs" are commonplace. "The Wall" is back.

Recently I have been doing various tricks with code, especially DTrace and OpenSolaris, and a blog seems like a useful way to share them. You'll probably find postings on computer programming languages including C, Perl, shell scripting and DTrace, plus various other topics of interest - AI programming, photography, rocketry, engineering, particle physics, beer, gaming, etc.

I'll try to get to the point. I'll also try to spell correctly. :-)

Enjoy.

Brendan

[Sydney, Australia]