DTrace
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table of Contents
|
|
|
DTrace is a comprehensive dynamic tracing facility that is built into Solaris and can be used by administrators and developers to examine the behavior of both user programs and of the operating system itself. With DTrace you can explore your system to understand how it works, track down performance problems across many layers of software, or locate the cause of aberrant behavior. It is safe to use on production systems and does not require restarting either the system or applications.
DTrace dynamically modifies the operating system kernel and user processes to record data at locations of interest, called probes. A probe is a location or activity to which DTrace can bind a request to perform a set of actions, like recording a stack trace, a timestamp, or the argument to a function. Probes are like programmable sensors scattered all over your Solaris system in interesting places. DTrace probes come from a set of kernel modules called providers, each of which performs a particular kind of instrumentation to create probes.
DTrace includes a new scripting language called D which is designed specifically for dynamic tracing. With D it is easy to write scripts that dynamically turn on probes, collect the information, and process it. D scripts make it convenient for users to share their knowledge and troubleshooting methods with others. A number of useful D scripts are included in Solaris 10, and more can be found on Sun's BigAdmin site: sun.com/bigadmin/content/dtrace/ and on the OpenSolaris project site: opensolaris.org/os/community/dtrace/.
First, a brief discussion on the construct of a D script. A D script consists of a probe description, a predicate, and actions as shown below:
probe description
/predicate/
{
actions
}
When the D script is executed the probes described in the probe description are enabled. The action statements are executed when the probes fire and if the predicate evaluates to true. Think of probes as events: a probe fires when the event happens. Let's take a simple D script example:
syscall::write:entry
/execname == "bash"/
{
printf("bash with pid %d called write system call\n",pid);
}
Here the probe description is the syscall::write:entry which describes the write system call. The predicate is /execname == "bash"/. This checks to see if the executable that is calling the write system call is the bash shell. The printf statement is the action that executes every time bash calls the write system call.
Now let's look a little deeper. The probe is described using four fields, the provider, module, function, and name.
Note that wild cards like * and ? can be used. Blank fields are interpreted as wildcards. Table 1 shows a few examples.
|
Table 1—Examples of Probe Descriptions.
A predicate can be any D expression. Table 2 illustrates a few examples. The action is executed only when the predicate evaluates to true.
|
Table 2—Examples of Predicates.
The action section can contain a series of action commands separated by semi-colons (;). Examples are given in Table 3.
|
Table 3—Examples of Actions.
Note that predicates and action statements are optional. If the predicate is missing, then the action is always executed. If the action is missing, then the name of the probe which fired is printed.
With this minimum information you can develop useful D scripts. The next section of this guide focuses on using DTrace from both application development and system administration perspectives.
The types of information application developers are particularly interested in can be obtained by using the following providers: syscall, proc, pid, sdt, vminfo. Developers can use these to look at running processes as well as process creation and termination, LWP creation and termination, and signal handling. This section focuses on the pid provider only.
The pid provider instruments the entry and return from any user level function in a running process. With the pid provider you can trace any instruction in any process on the system. The name of the pid provider includes the pid of the running process in which you are interested. A few examples of probe descriptions using the pid provider are given in Table 4.
|
Table 4—Examples of Pid Provider and Associated Probes.
Here is the command you can run to print all the functions that process id 1234 calls:
# dtrace -n pid1234:::entry
The same command can be used in a script as follows:
#!/usr/sbin/dtrace -s
/* The above line means that the script that follows needs to be
interpreted using dtrace. D uses C-style comments. */
pid1234:::entry
{}
You can use this command or script by replacing the 1234 with the pid of the process in which you are interested.
The following steps can be used to build a D script using the pid provider:
#!/usr/sbin/dtrace -s
pid$1:::entry
{}
@name[table index(es)] =aggregate_function()For example:
@count_table[probefunc] = count() ; This aggregation will collect information into a table with name of the function (probefunc is a built-in variable that has the name of the function). The aggregation function count() keeps track of the number of times the function was called. The other popular aggregation functions are average, min, max, and sum.
#!/usr/sbin/dtrace -s
pid$1:::entry
{
@count_table[probefunc]=count();
}
This script will collect information into a table and will continue to run until you press ^c (Control c). Once you stop the script, DTrace will print out the table of information. Notice that you do not need to write any code to print the table. DTrace automatically does this for you.
You may have noticed that the probes were created dynamically when running this script. The probes are disabled as soon as the script is stopped. You do not need to do anything special for disabling these probes. In addition, DTrace will automatically clean up any memory it allocated. You do need to write any code for cleanup.
#!/usr/sbin/dtrace -s
pid$1:::entry
{
@count_table[probemod,probefunc]=count();
}
#!/usr/sbin/dtrace -s
pid$1:::entry
{
ts[probefunc] = timestamp;
}
pid$1:::return
{
@func_time[probefunc] = sum(timestamp - ts[probefunc]);
ts[probefunc] = 0;
}
Note that the ts[] is an array, and D has automatically declared and initialized it for you. It is good practice to save space by setting the variable to 0.
#!/usr/sbin/dtrace -s
pid$1:::entry
{
self->ts[probefunc] = timestamp;
}
pid$1:::return
/self->ts[probefunc]/
{
@func_time[probefunc] = sum(timestamp ts[probefunc]);
ts[probefunc] = 0;
}
Note: /self->ts[probefunc]/ is the same as /self->ts[probefunc] != 0/
For very large applications the above script enables a large number of probes (possibly hundreds of thousands). Even though each probe is fairly light weight, adding that many probes to a live running system will impact the performance of your application.
|
Table 5—Modifying Probe Descriptions.
#!/usr/sbin/dtrace -s
pid$target:libc::entry
{
@[probefunc]=count();
}
# libc_func.d -c "cat /etc/hosts" Here are a few more sample scripts which you can use to get additional information about your system. Knowing the number of system calls, how deep the stack is, and knowing who is doing what will help you to understand the workload and to identify possible bottlenecks.
#!/usr/sbin/dtrace -s
syscall::write:entry
{
@[ustack()]=count();
}
#!/usr/sbin/dtrace -s
sysinfo:::pswitch
{
@[execname] = count();
}
#!/usr/sbin/dtrace -qs
proc:::exec-success
{
printf("%s(pid=%d) started by uid - %d\n",execname, pid, uid);
}
Among the tasks that every System Administrator faces are those related to the behavior or misbehavior of the applications running on a pre-determined environment. This type of information can be obtained through the following providers: syscall, proc, io, sched, sysinfo, vminfo, lockstat and profile. Of these providers, syscall, proc, io and sched are the easiest starting points. By using these providers, system administrators get information related to processes, threads, stack status, and many other kernel metrics. It is easier to start with the syscall, proc, io, and sched providers.
This is probably the most important provider to learn and use because system calls are the main communication channel between user level applications and the kernel. Knowing which system calls are being used among other information establishes metrics of system usage and identifies possible misbehavior. With the syscall provider you can easily identify who executes what and how much time a certain operation takes, helping you to identify the root cause of system misbehavior.
# dtrace -n syscall::close:entry
#!/usr/sbin/dtrace -s
syscall::kill:entry
{
trace(pid);
trace(execname);
}
#!/usr/sbin/dtrace -qs
BEGIN
{
printf("size\ttime\n");
}
syscall::read:entry
/execname == "httpd"/
{
self->start = timestamp;
}
syscall::read:return
/self->start/
{
printf("%d\t%d\n", arg0, timestamp - self->start);
self->start = 0;
}
This provider fires at processes and thread creation and termination as well as signals. It is a more sophisticated approach to the simple kill probe and will tell you which user sent a given signal (3head) to which process.
#!/usr/sbin/dtrace -wqs
proc:::signal-send
{
printf("%d was sent to %s by ", args[2], args[1]->pr_fname);
system("getent passwd %d | cut -d: -f5", uid);
}
#!/usr/sbin/dtrace -wqs
proc:::signal-send
/args[2] == SIGKILL/
{
printf("SIGKILL was sent to %s by ", args[1]->pr_fname);
system("getent passwd %d | cut -d: -f5", uid);
}
Here you can see the introduction of pr_fname, which is part of the structure of psinfo_t of the receiving process. For more information please refer to the Solaris Dynamic Tracing User Guide.
This provider dynamically traces scheduling events. Use it to understand when and why threads sleep, run, change priority, or wake other threads.
#!/usr/bin/dtrace -sq
sched:::on-cpu
/execname == "soffice.bin"/
{
self->on = vtimestamp;
}
sched:::off-cpu
/self->on/
{
@time[""] = sum(vtimestamp self->on);
self->on = 0;
}
This provider looks into the disk input and output (I/O) subsystem. With the io provider you can get an in-depth understanding of iostat(1M) output.
The io probes fire for all the disk requests, including NFS services, except on metadata requests.
#!/usr/bin/dtrace -qs
BEGIN
{
printf("%10s %58s %2s\n", "DEVICE", "FILE", "RW");
}
io:::start
{
printf("%10s %58s %2s\n", args[1]->dev_statname, args[2]->fi_pathname,
  args[0]->b_flags & B_READ ? "R" : "W");
}
The information in this guide is just a brief introduction to DTrace.
For more information on DTrace visit the following websites:Join the DTrace discussion group and participate in the ongoing discussion and development of this innovative new facility in Solaris.
For those interested in classroom training, SunU is offering the following courses:
|
||||||
under "Experienced System Administrators."