30 November, 2005

DTraceToolkit 0.88

I've just uploaded the latest version of the DTraceToolkit, version 0.88 (88 scripts). I've updated the OpenSolaris DTraceToolkit site to point to the new version. This version has many updated scripts and a few new ones.

Between 0.80 and 1.00 I'll be doing more work revisiting code and retesting code rather than adding scripts.
Homepage

Let me explain what is going on with my homepage URL for future reference.

My homepage is at http://www.brendangregg.com. It's a DNS pointer that points to wherever my homepage actually lives, which may well change - however my name will not!

And yes, it is likely that my actual homepage may move at some point. Some people have experienced DNS problems with the current location (users.tpg.com.au), and I've just uploaded a new DTraceToolkit and have almost run out of space!,
1070 files used (10%) - authorized: 10000 files
30688 Kbytes used (99%) - authorized: 30720 Kb

If you've linked to www.brendangregg.com, then no problem - it will always point to the right place (which is why I have the thing - I have been through a painful website move in the past).

29 November, 2005

Sys Admin Magazine

The December, 2005 copy contains an article on the DTraceToolkit written by Ryan Matteson. Grab a copy! The article is "Observing I/O Behavior with the DTraceToolkit", and is quite good. It was also selected as the feature article - which means it will be available online for some time,

http://www.samag.com/documents/sam0512a/

Thanks Matty, and Sys Admin Magazine!

24 November, 2005

DTrace Translators

While teaching a DTrace class in Sydney, I've been asked about translators. They are quite useful, so I've prepared the following as a quick demo.

This is a DTrace program to trace the time() syscall, print the process, it's parent, it's grand-parent, and so on.
#!/usr/sbin/dtrace -s

/* Declare Translator */

typedef struct ancestory {
string me; /* my cmd */
string p; /* parent cmd */
string gp; /* grand-parent cmd */
string ggp; /* great-grand-parent cmd */
string gggp; /* great-great-grand-parent cmd */
} ancestory_t;

translator ancestory_t < struct _kthread *T > {

/* fetch my details */
me = T->t_procp->p_user.u_comm;

/* fetch anscestor details if they exist */
p = T->t_procp->p_parent != NULL ?
T->t_procp->p_parent->p_user.u_comm :
"<none>";
gp = T->t_procp->p_parent != NULL ?
T->t_procp->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_user.u_comm :
"<none>" : "<none>";
ggp = T->t_procp->p_parent != NULL ?
T->t_procp->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent->p_user.u_comm :
"<none>" : "<none>" : "<none>";
gggp = T->t_procp->p_parent != NULL ?
T->t_procp->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent->p_parent != NULL ?
T->t_procp->p_parent->p_parent->p_parent->p_parent->p_user.u_comm :
"<none>" : "<none>" : "<none>" : "<none>";
};

inline ancestory_t *ancestors = xlate <ancestory_t *> (curthread);

/* Main Program */

syscall::gtime:entry
{
printf("%s, %s, %s, %s, %s", ancestors->me,
ancestors->p, ancestors->gp, ancestors->ggp, ancestors->gggp);
}

The main program at the end is quite consise, it prints the details from "ancestors". The translator has walked the p_parent pointers carefully, returning "<none>" if the pointer is NULL. ("ancestors->me" is unnecessary since we have "execname", I've included it as a simple demonstration).

The output is,

# ./transdemo.d
dtrace: script './transdemo.d' matched 1 probe
CPU ID FUNCTION:NAME
0 6615 gtime:entry bash, sh, bash, sshd, sshd
0 6615 gtime:entry date, bash, sh, bash, sshd
0 6615 gtime:entry bash, sh, bash, sshd, sshd
0 6615 gtime:entry nscd, init, sched, <none>, <none>
0 6615 gtime:entry nscd, init, sched, <none>, <none>
0 6615 gtime:entry nscd, init, sched, <none>, <none>
0 6615 gtime:entry nscd, init, sched, <none>, <none>
[...]

Without our careful pointer tests, the NULL parent pointers would have caused DTrace to print errors rather than our "<none>" keywords.

The "Declare Translator" section can be cut-n-pasted into a new .d file in /usr/lib/dtrace (eg, /usr/lib/dtrace/anscestors.d) where it will be automatically imported by every future DTrace script.

Take a look under /usr/lib/dtrace at the existing translator scripts, they are quite fascinating.

19 November, 2005

ZFS

The doors have been flung open for ZFS, Sun's "last word in filsystems". It's now in OpenSolaris and there is a ZFS Community page where you can find introductions, demonstrations, and advanced discussions. We don't know when ZFS will appear in Solaris 10, but for it to appear in OpenSolaris shows that the process has began.

ZFS raises the bar for filesystems to a new height, and even changes the way you think about filesystems. Let me provide a quick demo, although I don't have an array of spare disks handy - you'll need to pretend that each of these 1 Gb slices is actually a seperate disk,

# zpool create apps mirror c0t1d0s0 c0t1d0s1 mirror c0t1d0s3 c0t1d0s4


That's it - 1 command for a ZFS pool, that is both mirrored and dynamically striped (think RAID 1+0), 256 bit checksum'd, remounted on boot, and can be grown to a virtually unlimited size.

Lets run a few status commands to check it worked.

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
apps 1.98G 33.0K 1.98G 0% ONLINE -
#
# zpool status
pool: apps
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
apps ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t1d0s0 ONLINE 0 0 0
c0t1d0s1 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t1d0s3 ONLINE 0 0 0
c0t1d0s4 ONLINE 0 0 0
#
# df -h -F zfs
Filesystem size used avail capacity Mounted on
apps 2.0G 8K 2.0G 1% /apps


The size of 2 Gb is correct, and the "zpool status" command neatly prints the layout.

Now perhaps a slightly more realistic demo (although still no seperate disks, sorry). Rather than having all the disks combine to one filesystem, ZFS is really intended to combine disks into pools, and then have multiple filesystems share a pool. The following quick demo shows this,

# zpool create fast mirror c0t1d0s0 c0t1d0s1 mirror c0t1d0s3 c0t1d0s4
# zfs create fast/apps
# zfs create fast/oracle
# zfs create fast/home
# zfs set mountpoint=/export/home fast/home
# zfs set compression=on fast/home
# zfs set quota=500m fast/home
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
fast 91.0K 1.97G 9.5K /fast
fast/apps 8K 1.97G 8K /fast/apps
fast/home 8K 500M 8K /export/home
fast/oracle 8K 1.97G 8K /fast/oracle


Each filesystem may have different options set, such as quotas, reservations and compression. If a filesystem was running out of space, quotas can be changed live with a single command. If the pool was running out of space, disks can be added live with a single command.

As a programmer there are times when you encounter something that is so elegant and obvious that you are struck with a feeling that it is right. For a moment you can clearly see what the developer was thinking and that they achieved it perfectly. ZFS is one of those moments.