Fast Track to Solaris 10 Adoption: Dynamic Tracing
Functionality & Usability Issues
Please click on a question below or download a pdf version.
- How deeply can DTrace peer into applications? Does it see only system calls, or can it see app functions as well?
- Can DTrace be used to back-trace a core?
- Does DTrace have all the capabilities that truss has?
- How much overlap is there, if any, between DTrace and MDB?
- Can you write your own simulation? If yes, what language would you use to write it?
- Does Predictive Self-Healing use DTrace or facilities that DTrace also uses?
- Can DTrace emulate the functionality of iostat/vmstat/mpstat?
- Can DTrace look inside the Java process called from an application server to monitor specific Java functions?
- Do I have to understand the kernel to make sense of kernel instrumentation?
- Are there any plans to provide a GUI wrapper for output, a la SE Toolkit? Or will TCL and kin do the job? If so, are there any special procedures required?
- What is "anonymous tracing"?
- Do I need to be root to use DTrace?
- What level of expertise would I need to use DTrace to solve an application bottleneck problem that I am experiencing?
- Do you have to know D programming to use DTrace?
- Can I run multiple copies of DTrace simultaneously? Also can I run DTrace to follow a parent process like truss does?
- We run an rdist application that consistently crashes our server. No useful messages are generated. Would DTrace be an appropriate tool for tracing this, or is would it be better to use MDB?
- To diagnose problems, would you recommend running DTrace in a production system?
- Can DTrace be used in a grid environment to look at the grid in whole?
- Can I use DTrace to identify network problems like snoop and tcpdump does?
 |
Q: How deeply can DTrace peer into applications? Does it see only system calls, or can it see app functions as well?
A: You can see system calls and application function calls, as well as every single instruction in your application. For example, you can use DTrace to time how long you spend in function calls, or the exact sequence of instructions that a thread executed as it hit an error condition in a function. And that's just an example of the flexibility that DTrace offers.
Back to top
Q: Can DTrace be used to back-trace a core?
A: DTrace is for observing dynamic, running, behavior on the system. For debugging core files, try our post-mortem tool mdb(1). You can also use pstack(1) to print the stack backtrace for a live process or core file.
Back to top
Q: Does DTrace have all the capabilities that truss has?
A: Almost. DTrace lets you trace all system calls and application function calls (and with much less overhead by the way). It also lets you do much more, including tracing kernel function call, and any instruction in an application (not just function entry and return). Perhaps most usefully, DTrace is much more customizable than truss(1) so you can trace exactly the data you want rather than what truss(1) decides to give you. The one thing truss(1) has that DTrace doesn't is the ability to pretty-print system call arguments.
Back to top
Q: How much overlap is there, if any, between DTrace and MDB?
A: They're complementary. MDB is for postmortem analysis, while DTrace is for in-situ analysis. Both technologies use some of the same foundation technologies in the Solaris OS, however (for example, both MDB and DTrace use CTF, our format for type information).
Back to top
Q: Can you write your own simulation? If yes, what language would you use to write it?
A: No. DTrace doesn't offer any simulation ability, and it never allows you alter the system in a way that could be potentially fatal.
Back to top
Q: Does Predictive Self-Healing use DTrace or facilities that DTrace also uses?
A: Predictive Self-Healing uses a different event channel than DTrace and only shares facilities in that it relies on the robust and full-featured Solaris 10 OS kernel.
Back to top
Q: Can DTrace emulate the functionality of iostat/vmstat/mpstat?
A: iostat/vmstat/mpstat are monitoring tools. You still want to use those, because they indicate that you have a problem. Then you can use the DTrace io, vminfo and sysinfo providers (respectively) to dive deep. For example, let's say that iostat indicates a bunch of I/O to a device named "sd2." What's causing this? DTrace -n io start'/args[1]->dev_statname == "sd2"/{@[execname] = count()' will do the trick... See the io, vminfo and sysinfo chapters in the AnswerBook for gobs of examples of using iostat/vmstat/mpstat along with DTrace to find and root-cause problems.
Back to top
Q: Can DTrace look inside the Java process called from an application server to monitor specific Java functions?
A: Due to the complexities of the Java virtual machine, we can't currently trace Java functions. BUT, you can get Java stack backtraces using the ustack() action. Say for example, you want to know the Java activity causing disk I/O. To answer that you'd do something like this
DTrace -n io start'{ @[ustack(50,1000)] = count() }'
This aggregates based on the call stack and, when you hit ^C, DTrace(1M) will print a table of the stack backtrace and the frequency count. Check out the Solaris Dynamic Tracing Guide for more details on ustack(), aggregations and the I/O provider.
Back to top
Q: Do I have to understand the kernel to make sense of kernel instrumentation?
A: No. We have developed instrumentation with well-defined semantics for CPU scheduling, process control, and I/O (with many more on the way). So you don't need to understand any of the kernel implementation to make use of DTrace.
Q: Are there any plans to provide a GUI wrapper for output, a la SE Toolkit? Or will TCL and kin do the job? If so, are there any special procedures required?
A: While one could provide a GUI wrapper, we're actually focused on larger issues. DTrace completely changes the data that you can gather from the system for purposes of visualization. We have some exciting work going on in this department, but it's all very early.
In the more immediate term, we will be providing Java bindings to the libdtrace API allowing Java programs to act as DTrace consumers. A TK binding should be a snap as well.
Back to top
Q: What is "anonymous tracing"?
A: Anonymous tracing is a way to use DTrace without a running DTrace(1M) process. This is primarily useful for tracing during boot. See the chapter on anonymous tracing in the AnswerBook guide for details.
Back to top
Q: Do I need to be root to use DTrace?
A: By default, yes. However, with the new Solaris OS privilege model, you really just need the appropriate DTrace privileges which can be granted to non-root users.
Back to top
Q: What level of expertise would I need to use DTrace to solve an application bottleneck problem that I am experiencing?
A: DTrace is one of these tools that's like perl(1): once you know a little, you can start to do quite a bit, and each time you turn to the answer book, you can do even more. We've worked hard on the DTrace answer book to make sure it has that same impact spending just a few minutes looking at examples will give you the tools to start working on your application problem, and the more you know about DTrace the more you'll be able to do.
Back to top
The short answer is that you can get started on solving application problems immediately. Try enabling all the function entry probes in a process (DTrace -n pid entry), and already you're using DTrace to observe your app. Add in a call to the trace() action and you can examine arguments. You'll find that you can start to iterate very quickly on your problem.
Back to top
Q: Do you have to know D programming to use DTrace?
A: Not really, or at least not at first. For example "DTrace -n xcalls" is a valid use for DTrace that involves absolutely no D, but we think you'll find that D is really easy to get into. To expand on that example, "DTrace -n xcalls'{trace(execname)}'" traces the name of every application that induces a cross-call. Technically, this uses D but "trace(execname)" is pretty simple.
Back to top
Q: Can I run multiple copies of DTrace simultaneously? Also can I run DTrace to follow a parent process like truss does?
A: You can run as many copies of DTrace(1M) as you like.
DTrace(1M) currently doesn't let you follow a process like truss(1) does, but we're actively working on exactly that and hope to make it available in Solaris Express soon.
Back to top
Q: We run an rdist application that consistently crashes our server. No useful messages are generated. Would DTrace be an appropriate tool for tracing this, or is would it be better to use MDB?
A: If you mean that Solaris OS crashes, that should never happen. You should get that crash dump to your support person, and they'll use MDB to understand it. If "crash the server" means that an application is crashing, then the answer is still probably to use MDB on the core file, but DTrace may be useful here if it's completely reproducible.
Back to top
Q: To diagnose problems, would you recommend running DTrace in a production system?
A: Absolutely. Running in production environments is the design center of DTrace, and it (more than anything else) is what separates DTrace from everything that has come before it for any system. We're so confident about running DTrace in production, that when we last demo'd it to a large group of customers here in Menlo Park, we demo'd it on our local production NFS machine with 1000+ users. So yes, we recommend running DTrace in production.
Back to top
Q: Can DTrace be used in a grid environment to look at the grid in whole?
A: DTrace operates on a single system. If you have multiple Zones (Solaris Grid Containers), you can use DTrace to observe activity across every zone, including interactions between zones.
Back to top
Q: Can I use DTrace to identify network problems like snoop and tcpdump does?
A: This is an area that we're actively working on. We're currently developing stable networking providers networking equivalents of the I/O, proc, sched providers. You should see the first fruits of our labors in the coming months. Stay tuned.
|