DTrace is a tracing system designed by Sun Microsystems for real-time problem detection at both kernel and application levels. It has been available since November 2003 and was integrated as part of Solaris 10 in January 2005. DTrace is the first component of the OpenSolaris project whose code was released under the Common Development and Distribution License (CDDL).
DTrace is designed to provide information that allows users to tune applications and the operating system itself. It’s built to be used in production environments. The impact of the probes is minimal when tracing is active, and there’s no performance impact for inactive probes. This is important because a system includes tens of thousands of probes, many of which can be active.
Tracing programs (often called scripts) are written using the D programming language (not to be confused with D). D is a subset of the C language with additional functions and predefined variables specific to tracing operations. A program written in D structurally resembles a program written in AWK.
Since it can be time-consuming to create your own scripts each time, I’ve included here all the ones I’ve used.
On Solaris systems, the pagedaemon is responsible for scanning the page cache and adjusting the MMU reference bit of each dirty page it finds. When the fsflush daemon runs, it scans the page cache looking for pages with the MMU reference bit set, and schedules these pages to be written to disk. The fsflush.d D script provides a detailed breakdown of pages scanned, and the number of nanoseconds that were required to scan “SCANNED” pages:
Now you might be wondering why “SCANNED” is less than “EXAMINED?” This is due to a bug in fsflush, and a bug report was filed to address this anomaly. Tight!
Prior to Solaris 10, determining if an application accessed data in a sequential or random pattern required reviewing mounds of truss(1m) and vxtrace(1m) data. With the introduction of DTrace and Brendan Gregg’s seeksize.d D script, this question is trivial to answer:
This script measures the seek distance between consecutive reads and writes, and provides a histogram with the seek distances. For applications that are using sequential access patterns (e.g., dd in this case), the distribution will be small. For applications accessing data in a random nature (e.g., sched in this example), you will see a wide distribution. Shibby!
Prior to the introduction of DTrace, it was difficult to extract data on which files and disk devices were active at a specific point in time. With the introduction of fspaging.d, you can get a detailed view of which files are being accessed:
When system calls have problems executing, they usually return a value to indicate success or failure, and set the global “ERRNO” variable to a value indicating what went wrong. To get a system-wide view of which system calls are erroring out, we can use Brendan Gregg’s errinfo D script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ errinfo -c
Sampling... Hit Ctrl-C to end.
^C
EXEC SYSCALL ERR COUNT DESC
ttymon read111 Resource temporarily unavailable
utmpd ioctl 252 Inappropriate ioctl for device
init ioctl 254 Inappropriate ioctl for device
nscd lwp_kill 313 No such process
fmd lwp_park 6248 timer expired
nscd lwp_park 6248 timer expired
svc.startd lwp_park 6248 timer expired
vxesd accept 449 interrupted system call
svc.configd lwp_park 6249 timer expired
inetd lwp_park 6249 timer expired
svc.startd portfs 62490 timer expired
This will display the process, system call, and errno number and description from /usr/src/sys/errno.h! Jeah!
Several Solaris utilities provide a summary of the time spent waiting for I/O (which is a meaningless metric), but fail to provide facilities to easily correlate I/O activity with a process. With the introduction of psio.pl, you can see exactly which processes are responsible for generating I/O:
Once you find I/O intensive processes, you can use fspaging, iosnoop, and rwsnoop to get additional information:
1
2
3
4
5
6
$ iosnoop -n
MAJ MIN UID PID D BLOCK SIZE COMM PATHNAME
13680990 R 3416328192 dtrace /lib/sparcv9/ld.so.1
13680990 R 3415688192 dtrace /lib/sparcv9/ld.so.1
13680990 R 142189768192 dtrace /lib/sparcv9/libc.so.1
[ ... ]
1
2
3
4
5
6
$ iosnoop -e
DEVICE UID PID D BLOCK SIZE COMM PATHNAME
dad1 0404 R 4817128192 vxsvc /lib/librt.so.1
dad1 03 W 5163203072 fsflush /var/adm/utmpx
dad1 03 W 180357128192 fsflush /var/adm/wtmpx
[ ... ]
1
2
3
4
5
6
7
8
9
10
$ rwsnoop
UID PID CMD D BYTES FILE
100902 sshd R 42 /devices/pseudo/clone@0:ptm
100902 sshd W 80100902 sshd R 65 /devices/pseudo/clone@0:ptm
100902 sshd W 112100902 sshd R 47 /devices/pseudo/clone@0:ptm
100902 sshd W 960404 vxsvc R 1024 /etc/inet/protocols
[ ... ]
As a Solaris administrator, we are often asked to identify application I/O sizes. This information can be acquired for a single process with truss(1m), or system wide with Brendan Gregg’s bitesize.d D script:
Snoop(1m) and ethereal are amazing utilities, and provide a slew of options to filter data. When you don’t have time to wade through snoop data or download and install ethereal, you can use tcptop to get an overview of TCP activity on a system:
With Solaris 9, the “-p” option was added to vmstat to break paging activity up into “executable,” “anonymous” and “filesystem” page types:
1
2
3
4
5
6
$ vmstat -p 5 memory page executable anonymous filesystem
swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf
173815283232059000000000100168328081880002000000000000168328081880000000000000000
This was super useful information, but unfortunately doesn’t provide the executable responsible for the paging activity. With the introduction of whospaging.d, you can get paging activity per process:
1
2
3
4
5
6
7
8
9
10
11
12
13
$ whospaging.d
Who's waiting for pagein (milliseconds):
Who's on cpu (milliseconds):
svc.configd 0 sendmail 0 svc.startd 0 sshd 0 nscd 1 dtrace 3 fsflush 14 dd 1581 sched 3284
Once we get the process name that is responsible for the paging activity, we can use dvmstat to break down the types of pages the application is paging (similar to vmstat -p, but per process!):
1
2
3
4
5
6
7
8
9
$ dvmstat -p 0 re maj mf fr epi epo api apo fpi fpo sy
0001328000000132800000135040000013504000013472000001347200001347200000134720000132480000013248000013376000001337600001366400000136640
Once we have an idea of which pages are being paged in or out, we can use iosnoop, rwsnoop and fspaging.d to find out which files or devices the application is writing to! Since these rockin’ scripts go hand in hand, I am placing them together. Shizam!
And without further ado, number 1 goes to … (drum roll)
After careful thought, I decided to make iotop and rwtop #1 on my top ten list. I have long dreamed of a utility that could tell me which applications were actively generating I/O to a given file, device or file system. With the introduction of iotop and rwtop, my wish came true:
1
2
3
4
5
6
7
8
9
10
11
$ iotop 52005 Jul 19 13:33:15, load: 0.24, disk_r: 95389 Kb, disk_w: 0 Kb
UID PID PPID CMD DEVICE MAJ MIN D BYTES
0991 nscd dad1 1368 R 16384070377033 find dad1 1368 R 2266112070367033 dd sd7 3258 R 15794176070367033 dd sd6 3250 R 15826944070367033 dd sd5 3242 R 15826944070367033 dd vxio21000 10021000 R 47448064
1
2
3
4
5
6
7
8
9
10
11
12
13
$ rwtop 52005 Jul 24 10:47:26, load: 0.18, app_r: 9 Kb, app_w: 8 Kb
UID PID PPID CMD D BYTES
100922920 bash R 3100922920 bash W 15100902899 sshd R 1223100926922 ls R 1267100902899 sshd W 1344100926922 ls W 2742100920917 sshd R 2946100920917 sshd W 481904041 vxsvc R 5120