SitefinderOracle and Sun
Secure Search

BigAdmin System Administration Portal
XPert Session - DTrace (Dynamic Tracing Framework)
Active Tab XPert Session
Begin Tab Sub Links Active SubSession XPerts Home
Page 1 (1-10 of 18 questions)
Last Updated November 10, 2004
 

Q: What will DTrace allow me to do?

A: Some example usages include:

  • Examine the behavior of user programs and the Solaris OS and quickly identify the root causes of system and application bottlenecks
  • Highlight trends and patterns to tune systems for best performance
  • Track down performance problems across many layers of software
  • Locate the cause of aberrant behavior
  • rite reusable scripts for common or complex routines
  • Specify the data DTrace collects, the actions it takes, and the conditions under which it should take those actions

Back to top


Q: Where can I find DTrace documentation for the details?

A: Early documentation is on the BigAdmin DTrace page.

Back to top


Q: Do I need to recompile my apps to use DTrace?

A: Absolutely not -- you don't even need to restart them!

Back to top


Q: Does DTrace introduce any performance hits?

A: When DTrace is disabled, there is no performance effect at all. The DTrace framework allows you to enable any of 30,000 or more probes. If you just enable a few, the hit will be relatively small; if you enable them all, it will create a noticeable impact, but the machine will still be fine.

Back to top


Q: Does DTrace work on x86?

A: Of course! DTrace is fully functional on both SPARC and x86 platforms. And most of our development -- and about 99 percent of our public demos -- are on laptops running the Solaris OS, x86 Platform Edition.

Back to top


Q: Is there a plan for a Sun Blueprints Book on DTrace or some special online Blueprints like "Best Practices for DTrace"?

A: We're considering a variety of ways to continue providing education on DTrace. For now, we've incorporated a ton of blueprint-style examples into the Solaris Dynamic Tracing Guide (our answerbook), which can be found on docs.sun.com or on our BigAdmin web page.

If there's something you find especially beneficial about a BluePrint or that you think is missing from the current documentation, it would be great to let us know over e-mail or on the DTrace BigAdmin forum. Is there a specific style of information you want? Is it having a smaller, more condensed quick-start guide? Or do you want longer case studies?

We'll use this feedback to help us figure out what is the best type of additional documentation content we need to create via BluePrints, etc.

Back to top


Q: How does DTrace differ from truss?

A: DTrace differs from truss(1) in two major ways: one is that the method by which it gathers information is significantly less invasive than truss, and the other is that DTrace offers system-wide introspection into many different areas of the system, not just processes. I'll expand on each of these in more detail.

truss(1) is a utility for tracing the system calls of one or more processes and it uses the proc(4) filesystem as the way it gathers information. /proc is something that is designed for traditional debugging utilities, so it is based around the idea of stopping a process, reading or writing various bits of its state, and then resuming it. truss therefore has the effect of stopping the process on return from each system call it performs, reporting this result, and the continuing the process.

As a tracing framework, DTrace operates a bit differently in how data is gathered. DTrace performs dynamic instrumentation of various code paths in the system and permits data to be recorded into buffers, which are then copied out to its userland consumer on a regular basis for formatting. So while code flows do temporarily stop doing what they were doing and enter DTrace, there is no notion of having to "stop" a thread or process in a heavy-handed sort of way as /proc does (unless you use the stop() action, of course). So the probe effect of DTrace is typically much less than /proc-based tools such as truss(1).

The second major difference is that DTrace permits observability of basically everything in the system, not just the process model. And DTrace is programmable using our tracing language, D, so you can tell it to do basically anything, not just a set of behaviors configured by command-line switches. In fact, if you read Chapter 1 of the Solaris Dynamic Tracing Guide (our docs) you'll see that we implement a simple version of truss for reads and writes in a few lines of D.

So while the DTrace proc and syscall providers let you look at similar sorts of things as /proc, you can program these things in any way you like and you can also combine that information with information from all of our other providers: i/o information, cpu scheduling, vm statistics, user function calls, and all the others. You can also use DTrace to look system-wide, per-process, per-thread, per-file, or any number of "views" you can create.

So DTrace gives you a general purpose, programmable facility for basically answering any kind of question about the system. Meantime, truss(1) can still be a valuable part of your toolbox for problem-solving when you find that you have a particular question about a particular process's system-call behavior which can be adequately answered using truss's built-in capabilities.

Back to top


Q: I don't know what 's up with my application, it does not seem to be memory, io or cpu intensive (based on vmstat, iostat & mpstat).
So where do I start to find the bottleneck(s) using Dtrace?
I have briefly read thru the sched, io and pid providers chapters but I still seem to be lost. Do you have any specific advice for this lost soul?

A: The question is: why isn't your application bound by one of these physical resources? The ideal app (from a performance perspective) should be bound only by a physical resource -- allowing the performance of the app to be improved simply by improving the performance of the limiting physical resource. For most apps, this means that you want the app to be CPU-bound. So if your app isn't CPU-bound, the question becomes: why not? To answer this first question, just use the sched provider to figure out why your app is giving up the CPU. See the sched provider chapter for more details, but this is a good enabling with which to start:

# dtrace -n sched:::off-cpu'/execname == "my_app"/{@[ustack()] = count()}'

This should give you answers that prod new questions -- questions that can also be answered with DTrace, of course.

PS: The capabilities of DTrace seem to awesome. If I only knew how to harness its power.

Thanks for your kind words -- and we hear you that we need better documentation surrounding DTrace "best practices"...

Back to top


Q: Can DTrace be used for memory corruption and leak detection in user apps? If it can, should I throw away my Purify license(s)?

A: The better tool for that job is libumem(3LIB), a new memory allocator in Solaris 10. Check out the manpage for umem_debug(3MALLOC) and/or look at Adam's blog on the subject: http://blogs.sun.com/roller/page/ahl/?anchor=solaris_10_top_11_20

Unlike Purify, libumem is able to institute checks with a sufficiently small degradation in performance that you may choose to leave it enabled on your production systems. And using gcore and ::findleaks, you can find memory leaks in your app while it's running!

And DTrace plays a role here too: using libumem, you may come up with a hypothesis for the source of a memory leak or memory corruption -- and you can then use DTrace to quickly investigate this hypothesis. Ironically, I recently did exactly this to debug a memory corruption problem in the dtrace(1M) command itself. I first used libumem and its debugging features to hone in on the problem. This told me that we died while accessing a freed structure, and I could get a stack trace of where the free occurred. This told me that the containing structure itself may not have been cleaned up properly; using DTrace, I was able to investigate this hypothesis, and observe that the containing structure was indeed being erroneously reused. Thanks to the libumem/DTrace tag-team, the time to resolve this issue was almost less time than it took me to write this paragraph. There's nothing quite so satisfying as using your own tools to debug your own bugs... ;)

Back to top


Q: I'm working as a security analyst in one of the banks, and I want to know how DTrace will help me concerning security. BSM is running in all the machines, does it reduce the performance running both?

A: DTrace isn't a security tool per-se, so it is orthogonal to the type of functionality that BSM provides. You can certainly run both on your machine without fear of significantly impacting performance. DTrace uses solely dynamic instrumentation, so a system where DTrace isn't being used is the same as one where it isn't installed: the performance impact when disabled is effectively zero. Then when you use DTrace, only the instrumentation you ask for is enabled selectively, and since actions are dynamic as well, your requests pay a performance cost proportional to what you ask DTrace to do.

As a security-aware person, you may also wish to read the Security chapter of the Solaris Dynamic Tracing Guide, available on docs.sun.com or BigAdmin, where we describe DTrace's own security attributes. Access to DTrace is restricted, least-privilege bits can be used to grant limited access to particular users through the user_attr facility, and errant DTrace programs cannot crash the system.

Back to top


BigAdmin
  
 
BigAdmin Upgrade Hub
 
Oracle - The Information Company