
Accelerating Application Performance With Dynamic Tracing (DTrace)
The Solaris 10 Operating System includes many new technologies as well as major improvements to existing features. All of the improvements are designed to help customers reduce cost, complexity, and risk. In the previous issue of EduConnection, we looked at one of these new technologies, Solaris Containers. In this issue, we discuss the Dynamic Tracing facility, DTrace.
DTrace
When systems exhibit nonfatal errors or subpar performance, the sheer complexity of the distributed software environment can make accurate diagnosis of the root cause extremely difficult. Transient failures such as not being able to support an anticipated user load or consistently not meeting service-level agreements during peak hours of the day can benefit from deep visibility into system behavior to help diagnose the problem. Traditional approaches for debugging transient failures and tuning performance have involved examining postmortem crash dump files and using trial and error approaches to tuning. Not only are these approaches slow and time-consuming, but they may never completely resolve the problem if the root cause is not found. As a result, many applications languish at performance levels nowhere near optimal levels.
The new Solaris Dynamic Tracing (DTrace) facility is a powerful tool that can help developers quickly tune their applications for maximum performance and rapidly identify the root cause of system and application problems. Tasks that might take days or weeks with traditional approaches can often be accomplished in hours or minutes. DTrace is simple enough to be used by both entry-level and experienced developers, and offers substantial benefits that can help businesses reduce cost, complexity, and risk:
Cost Resolving system or application performance bottlenecks can be reduced from days to hours, saving on labor costs. DTrace also enables businesses to save on hardware costs in two ways. Improving application performance provides more room for growth without the expense of upgrading hardware. And since DTrace can be safely used on production systems, it virtually eliminates the need to deploy a separate test environment.
Complexity DTrace's single view of the software stack greatly simplifies the tracing process, enabling developers to follow a thread as it crosses between kernel space and user land and back.
Risk DTrace works with applications as is; there is no need to modify applications, install a debug utility, reboot the OS, or restart applications before, during, or after the DTrace session. Developers can even use DTrace to analyze or tune applications on the Solaris 10 Operating System and then deploy or redeploy those applications on an earlier version of the Solaris OS while retaining most of the benefits of the performance-tuning exercise.
The primary value for those developing business applications is in delivering higher-quality solutions that offer greater performance and stability. For IT service delivery organizations, it means greater utilization of system resources by making better use of the CPU cycles available.
Enhanced Visibility
The Solaris DTrace facility provides dynamic instrumentation and tracing for both application and kernel activities even those running in a Java Virtual Machine. It enables developers to explore the entire system to understand how it works, track down performance problems across many layers of software, or locate the cause of aberrant behavior. It even allows creation of custom scripts to dynamically instrument the system and provide immediate, concise answers to arbitrary questions formulated using the DTrace D programming language.
Tracing is accomplished by dynamically modifying the operating system kernel and user processes to record additional data at locations of interest, called probes. A probe is a location or activity to which DTrace can bind a request to perform a set of actions, like recording a stack trace, a timestamp, or the argument to a function. Probes are like programmable sensors scattered in key places throughout the Solaris OS. To explore an area in question, users can turn on the appropriate sensors and program them to record information of interest. Then, as each probe fires, DTrace will gather the data from the probes and report it. If no action is specified for a probe, DTrace simply tracks each time that the probe fires.
DTrace probes come from a set of kernel modules called providers, each of which knows how to perform a particular type of instrumentation to activate probes. When DTrace is run, it invokes a compiler for its D language to look for probes that have been requested by the user and to gather data from providers about the instrumentation that is needed to activate the probes. Providers maintain the system-level information about probes, allowing users to request actions to be taken when the probe fires and leaving the instrumentation of that request to be performed dynamically by the DTrace utility. For example, the user may request that DTrace publish values for specific variables whenever a probe fires and DTrace would execute the required actions to collect that data whenever the probe fires.
Developers can use DTrace D programs to bind their own customized tracing actions to any of the more than 30,000 published probes within the Solaris kernel and to instrument any line of code in an application that runs on the Solaris OS. Figure 3 shows the different components of the DTrace facility, including providers, probes, DTrace kernel software, and the DTrace command.
Reducing Risk

The DTrace facility is architected to enable visibility into both user land and the Solaris kernel.
|
Stability and low-performance overhead are hallmarks of this new utility because DTrace was designed from the beginning to run on production systems. Risk is reduced because users can dynamically turn probes on and off with no need to reboot or otherwise configure the operating system, disable or alter applications, or change user or client access. DTrace is also programmable, so analysis routines can be written and reused.
Safety is enhanced because the DTrace execution environment performs its own error handling and uses proven probes that already exist inside the Solaris OS. D program runtime errors such as dividing by zero or referencing invalid memory are managed directly by DTrace. When such an error occurs, DTrace simply reports the error and disables the instrumentation, allowing the developer to correct the mistake and try again. As a result, developers can never construct an unsafe program that would cause DTrace to inadvertently damage the Solaris kernel or one of the user processes running on the system. These safety features allow DTrace to be used in a production environment without fear of crashing or corrupting the system.
Virtually Eliminating the Performance Penalty for Tracing
Although DTrace is always available and ready for use, it has no impact on system performance when not being used. All of the instrumentation in DTrace is completely dynamic. Probes are enabled discretely only when they are specifically called out by the user. No instrumented code is present for inactive probes, so there is no performance degradation of any kind when DTrace is not in use. Once the DTrace command exits, all of the probes that were used are automatically disabled and their instrumentation is removed, returning the system to its exact original state.
DTrace instrumentation is also designed to be as efficient as possible. When DTrace is executed, the instrumentation for each probe is performed dynamically on the live running system. The system is not paused in any way and instrumentation code is added only for the probes that are enabled. As a result, the probe effect of using DTrace is limited to exactly what DTrace is asked to do; no extraneous data is traced.
Success With DTrace
DTrace has helped Sun improve the performance of both kernel functions in the Solaris OS and customer business applications. For example, for one customer's business-critical trading application, DTrace was run on a live system and the team was quickly able to pinpoint a bottleneck presented by the customer's use of a nonscalable memory allocator. Replacing the component with a more scalable version resulted in a 1000-percent performance increase.
Improvements of this scale not only dramatically reduce overall system cost, but they can also have a direct impact on the number of transactions processed each day, adding to top-line revenues.
DTrace provides every developer and customer running the Solaris 10 OS with a more powerful tool than any kernel developer has ever had to analyze performance. It empowers them with advanced observability into the systems they own to see how they work. With DTrace, the bottom line is developing higher quality applications, lowering costs, reducing downtime, and providing greater utilization of existing resources to improve ROI.
For more information on DTrace, click here to view a short Webinar, visit www.sun.com/software/solaris/observability.jsp, contact education_news@sun.com or click here to have your local Sun representative contact you.
|