DTrace : détection de problèmes en temps réel

From Deimos.fr / Bloc Notes Informatique
Jump to: navigation, search

1 Introduction

Dtrace est un système de trace conçu par Sun Microsystems pour la détection de problèmes en temps réel au niveau noyau ou au niveau applicatif. Il est disponible depuis Novembre 2003, et a été intégré en tant que partie de Solaris 10 en Janvier 2005. Dtrace est le premier composant du projet OpenSolaris dont le code a été délivré sous la license Common Development and Distribution License (CDDL).

Dtrace est un système de trace conçu pour donner des informations qui permettent aux utilisateurs d'ajuster des applications et le système d'exploitation lui-même. Il est conçu pour être utilisé dans des environements de production. Ainsi, l'effet des sondes est minimal quand l'action de trace est en cours, et il n'y a pas d'impact de performance pour les sondes non actives. C'est important car un système comprend des dizaines de milliers de sondes dont beaucoup peuvent être actives.

On écrit les programmes de trace (souvent appelés script) en utilisant le langage de programmation D (à ne pas confondre avec D). D est un sous-ensemble du langage C avec l'addition de fonctions et variables prédéfinies spécifiques à l'opération de trace. Un programme écrit en D ressemble par sa structure à un programme écrit en AWK.

Sachant qu'il est toujours un peu long de se faire ses propres scripts à chaque fois, j'ai préférer mettre ici tous ceux que j'ai utilié.

2 Print Utilization statistics per process

Brendan Gregg developed prustat to display the top processes sorted by CPU, Memory, Disk or Network utilization:

Command prustat
$ prustat -c -t 10 5

 PID   %CPU   %Mem  %Disk   %Net  COMM
7176   0.88   0.70   0.00   0.00  dtrace
7141   0.00   0.43   0.00   0.00  sshd
7144   0.11   0.24   0.00   0.00  sshd
   3   0.34   0.00   0.00   0.00  fsflush
7153   0.03   0.19   0.00   0.00  bash
  99   0.00   0.22   0.00   0.00  nscd
7146   0.00   0.19   0.00   0.00  bash
  52   0.00   0.17   0.00   0.00  vxconfigd
7175   0.07   0.09   0.00   0.00  sh
  98   0.00   0.16   0.00   0.00  kcfd

This script is super useful for getting a high level understanding of what is happening to a Solaris server. Golden!

3 File System Flush Activity

On Solaris systems, the pagedaemon is reponsible for scanning the page cache and adjusting the MMU reference bit of each dirty page it finds. When the fsflush daemon runs, it scans the page cache looking for pages with the MMU reference bit set, and schedules these pages to be written to disk. The fsflush.d D script provides a detailed breakdown of pages scanned, and the number of nanoseconds that were required to scan "SCANNED" pages:

Command fsflush.d
$ fsflush.d
  SCANNED   EXAMINED     LOCKED   MODIFIED   COALESCE   RELEASES   TIME(ns)
     4254       4255          1          1          0          0    2695024
     4254       4255          1          0          0          0    1921518
     4254       4255          6          0          0          0    1989044
     4254       4255          1          0          0          0    2401266
     4254       4255          4          1          0          0    2562138
     4254       4255         89          4          0          0    2425988
     4254       3744         80         25          0          0    2394895
     4254       4255         28          8          0          0    1776222
     4254       4255        216          8          0          0    2350826
     4254       4255        108          7          0          0    2356146

Now you might be wondering why "SCANNED" is less than "EXAMINED?" This is due to a bug in fsflush, and a bug report was filed to address this anomaly. Tight!

4 Seek Sizes

Prior to Solaris 10, determining if an application accessed data in a sequential or random pattern required reviewing mounds of truss(1m) and vxtrace(1m) data. With the introduction of DTrace and Brendan Gregg's seeksize.d D script, this question is trivial to answer:

Command seeksize.d
$ seeksize.d
Sampling... Hit Ctrl-C to end.
^C
 
     PID  CMD
    7312  dd if=/dev/dsk/c1t1d0s2 of=/dev/null bs=1048576
 
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1762
               1 |                                         0
 
       0  sched
 
           value  ------------- Distribution ------------- count
        -1048576 |                                         0
         -524288 |@@@@                                     1
         -262144 |                                         0
         -131072 |                                         0
          -65536 |                                         0
          -32768 |                                         0
          -16384 |                                         0
           -8192 |                                         0
           -4096 |                                         0
           -2048 |                                         0
           -1024 |                                         0
            -512 |                                         0
            -256 |                                         0
            -128 |                                         0
             -64 |                                         0
             -32 |                                         0
             -16 |                                         0
              -8 |                                         0
              -4 |                                         0
              -2 |                                         0
              -1 |                                         0
               0 |                                         0
               1 |                                         0
               2 |                                         0
               4 |                                         0
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |                                         0
             128 |@@@@                                     1
             256 |@@@@                                     1
             512 |@@@@                                     1
            1024 |@@@@                                     1
            2048 |                                         0
            4096 |                                         0
            8192 |                                         0
           16384 |@@@@                                     1
           32768 |@@@@                                     1
           65536 |@@@@@@@@                                 2
          131072 |                                         0
          262144 |                                         0
          524288 |@@@@                                     1
         1048576 |                                         0

This script measures the seek distance between consecutive reads and writes, and provides a histogram with the seek distances. For applications that are using sequential access patterns (e.g., dd in this case), the distribution will be small. For applications accessing data in a random nature (e.g, sched in this example), you will see a wide distribution. Shibby!

5 Print Overall Paging Activity

Prior to the introduction of DTrace, it was difficult to extract data on which files and disk devices were active at a specific point in time. With the introduction of fspaging.d, you can get a detailed view of which files are being accessed:

Command fspaging.d
$ fspaging.d
Event      Device                                                    Path RW     Size   Offset
get-page                                           /lib/sparcv9/libc.so.1        8192
get-page                                  /usr/lib/sparcv9/libdtrace.so.1        8192
get-page                                           /lib/sparcv9/libc.so.1        8192
get-page                                                   /lib/libc.so.1        8192
put-page                  /etc/svc/volatile/system-system-log:default.log        8192
put-page                              /etc/svc/volatile/svc_nonpersist.db        8192
put-page                              /etc/svc/volatile/svc_nonpersist.db        8192
put-page                                /etc/svc/volatile/init-next.state        8192
put-page                        /etc/svc/volatile/network-ssh:default.log        8192
put-page                       /etc/svc/volatile/network-pfil:default.log        8192

This is a super useful script! Niiiiiiiiiiice!

6 Getting System Wide errno Information

When system calls have problems executing, they usually return a value to indicate success or failure, and set the global "ERRNO" variable to a value indicating what went wrong. To get a system wide view of which system calls are erroring out, we can use Brendan Gregg's errinfo D script:

Command errinfo
$ errinfo -c
Sampling... Hit Ctrl-C to end.
^C
            EXEC          SYSCALL  ERR  COUNT  DESC
          ttymon             read   11      1  Resource temporarily unavailable
           utmpd            ioctl   25      2  Inappropriate ioctl for device
            init            ioctl   25      4  Inappropriate ioctl for device
            nscd         lwp_kill    3     13  No such process
             fmd         lwp_park   62     48  timer expired
            nscd         lwp_park   62     48  timer expired
      svc.startd         lwp_park   62     48  timer expired
           vxesd           accept    4     49  interrupted system call
     svc.configd         lwp_park   62     49  timer expired
           inetd         lwp_park   62     49  timer expired
      svc.startd           portfs   62    490  timer expired

This will display the process, system call, and errno number and description from /usr/src/sys/errno.h! Jeah!

7 I/O per process

Several Solaris utilities provide a summary of the time spent waiting for I/O (which is a meaningless metric), but fail to provide facilities to easily correlate I/O activity with a process. With the introduction of psio.pl, you can see exactly which processes are responsible for generating I/O:

Command psio.pl
$ psio.pl
     UID   PID  PPID %I/O    STIME TTY      TIME CMD
    root  7312  7309 70.6 16:00:59 pts/2   02:36 dd if=/dev/dsk/c1t1d0s2 of=/dev/null bs=1048576
    root     0     0  0.0 10:24:18 ?       00:02 sched
    root     1     0  0.0 10:24:18 ?       00:03 /sbin/init
    root     2     0  0.0 10:24:18 ?       00:00 pageout
    root     3     0  0.0 10:24:18 ?       00:51 fsflush
    root     7     1  0.0 10:24:20 ?       00:06 /lib/svc/bin/svc.startd
    root     9     1  0.0 10:24:21 ?       00:14 /lib/svc/bin/svc.configd

Once you find I/O intensive processes, you can use fspaging, iosnoop, and rwsnoop to get additional information:

Command iosnoop
$ iosnoop -n
MAJ MIN   UID   PID D    BLOCK   SIZE       COMM PATHNAME
136   8     0   990 R   341632   8192     dtrace /lib/sparcv9/ld.so.1
136   8     0   990 R   341568   8192     dtrace /lib/sparcv9/ld.so.1
136   8     0   990 R 14218976   8192     dtrace /lib/sparcv9/libc.so.1
 [ ... ]

Command iosnoop
$ iosnoop -e
DEVICE    UID   PID D    BLOCK   SIZE       COMM PATHNAME
dad1        0   404 R   481712   8192      vxsvc /lib/librt.so.1
dad1        0     3 W   516320   3072    fsflush /var/adm/utmpx
dad1        0     3 W 18035712   8192    fsflush /var/adm/wtmpx
 [ ... ]

Command rwsnoop
$ rwsnoop
 UID    PID CMD          D   BYTES FILE
 100    902 sshd         R      42 /devices/pseudo/clone@0:ptm
 100    902 sshd         W      80 
 100    902 sshd         R      65 /devices/pseudo/clone@0:ptm
 100    902 sshd         W     112 
 100    902 sshd         R      47 /devices/pseudo/clone@0:ptm
 100    902 sshd         W      96 
   0    404 vxsvc        R    1024 /etc/inet/protocols
  [ ... ]

Smooooooooooth!

8 I/O Sizes Per Process

As a Solaris administrator, we are often asked to identify application I/O sizes. This information can be acquired for a single process with truss(1m), or system wide with Brendan Gregg's bitesize.d D script:

Command bitesize.d
$ bitesize.d
Sampling... Hit Ctrl-C to end.
 
    3  fsflush
 
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |@                                        1
            2048 |                                         0
            4096 |@@                                       2
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    39
           16384 |                                         0
 
    7312  dd if=/dev/dsk/c1t1d0s2 of=/dev/null bs=1048576
 
           value  ------------- Distribution ------------- count
              16 |                                         0
              32 |                                         2
              64 |                                         0
             128 |                                         0
             256 |                                         0
             512 |                                         2
            1024 |                                         0
            2048 |                                         0
            4096 |                                         0
            8192 |                                         0
           16384 |                                         0
           32768 |                                         0
           65536 |                                         0
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 76947
          262144 |                                         0

If only Dorothy could see this!

9 TCP Top

Snoop(1m) and ethereal are amazing utilities, and provide a slew of options to filter data. When you don't have time to wade through snoop data or download and install ethereal, you can use tcptop to get an overview of TCP activity on a system:

Command tcptop

$ tcptop 5

2005 Jul 19 14:09:06, load: 0.01, TCPin: 2679 Kb, TCPout: 12 Kb

UID    PID LADDR           LPORT RADDR           RPORT      SIZE NAME
  0   7138 192.168.1.3     44084 192.18.108.40      21       544 ftp
  0    352 192.168.1.3        22 192.168.1.8     49805      1308 sshd
100   7134 192.168.1.3     44077 192.168.1.1        22      1618 ssh
  0   7138 192.168.1.3     44089 24.98.83.96     51731   2877524 ftp

Now this is some serious bling!

10 Who's paging and DTrace enhanced vmstat

With Solaris 9, the "-p" option was added to vmstat to break paging activity up into "executable," "anonymous" and "filesystem" page types:

Command vmstat
$ vmstat -p 5
     memory           page          executable      anonymous      filesystem
   swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
 1738152 832320 5   9   0   0   0    0    0    0    0    0    0    1    0    0
 1683280 818800 0   2   0   0   0    0    0    0    0    0    0    0    0    0
 1683280 818800 0   0   0   0   0    0    0    0    0    0    0    0    0    0

This was super useful information, but unfortunatley doesn't provide the executable responsible for the paging activity. With the introduction of whospaging.d, you can get paging activity per process:

Command whospaging.d
$ whospaging.d
 
Who's waiting for pagein (milliseconds):
Who's on cpu (milliseconds):
  svc.configd                                                      0
  sendmail                                                         0
  svc.startd                                                       0
  sshd                                                             0
  nscd                                                             1
  dtrace                                                           3
  fsflush                                                         14
  dd                                                            1581
  sched                                                         3284

Once we get the process name that is reponsible for the paging activity, we can use dvmstat to break down the types of pages the application is paging (similar to vmstat -p, but per process!):

Command dvmstat
$ dvmstat -p 0
    re   maj    mf   fr  epi  epo  api  apo  fpi  fpo     sy
     0     0     0 13280    0    0    0    0    0 13280      0
     0     0     0 13504    0    0    0    0    0 13504      0
     0     0     0 13472    0    0    0    0    0 13472      0
     0     0     0 13472    0    0    0    0    0 13472      0
     0     0     0 13248    0    0    0    0    0 13248      0
     0     0     0 13376    0    0    0    0    0 13376      0
     0     0     0 13664    0    0    0    0    0 13664      0

Once we have an idea of which pages are being paged in or out, we can use iosnoop, rwsnoop and fspaging.d to find out which files or devices the application is writing to! Since these rockin' scripts go hand in hand, I am placing them together. Shizam!

And without further ado, number 1 goes to ... (*drum roll*)

11 I/O Top

After careful thought, I decided to make iotop and rwtop #1 on my top ten list. I have long dreamed of a utility that could tell me which applications were actively generating I/O to a given file, device or file system. With the introduction of iotop and rwtop, my wish came true:

Command iotop
$ iotop 5
 
2005 Jul 19 13:33:15,  load: 0.24,  disk_r:  95389 Kb,  disk_w:      0 Kb
 
  UID    PID   PPID CMD              DEVICE  MAJ MIN D            BYTES
    0     99      1 nscd             dad1    136   8 R            16384
    0   7037   7033 find             dad1    136   8 R          2266112
    0   7036   7033 dd               sd7      32  58 R         15794176
    0   7036   7033 dd               sd6      32  50 R         15826944
    0   7036   7033 dd               sd5      32  42 R         15826944
    0   7036   7033 dd               vxio21000 100 21000 R         47448064
<syntaxhighlight lang=text>}}
{{command|rwtop|<syntaxhighlight lang=text>
$ rwtop 5
2005 Jul 24 10:47:26,  load: 0.18,  app_r:      9 Kb,  app_w:      8 Kb
 
  UID    PID   PPID CMD              D            BYTES
  100    922    920 bash             R                3
  100    922    920 bash             W               15
  100    902    899 sshd             R             1223
  100    926    922 ls               R             1267
  100    902    899 sshd             W             1344
  100    926    922 ls               W             2742
  100    920    917 sshd             R             2946
  100    920    917 sshd             W             4819
    0    404      1 vxsvc            R             5120

12 References

http://brendangregg.com/
DTrace User Guide
Observing I/O Behavior with the DTraceToolkit
DTrace Toolkit
DTrace Topics

12.1 Dtrace GUI

http://www.netbeans.org/kb/docs/ide/NetBeans_DTrace_GUI_Plugin_0_4.html