Traditional processor design has long emphasized the performance of a single hardware thread of execution, and focused on
providing high levels of instruction-level parallelism. These increasingly complex processor designs have been driven to very
high clock rates (frequencies), often at the cost of increased power consumption and heat production. Unfortunately, the impact
of memory latency has meant that even the fastest single-threaded processors spend most of their time idle, waiting for memory.
Complicating this tendency, many of today's complex commercial workloads are simply unable to take advantage of instruction-level
parallelism, instead benefiting from thread-level parallelism.
This Sun BluePrints article describes techniques that system architects, application developers, and performance analysts can use
to assess the scaling characteristics of an application. It also explains how to optimize an application for chip multithreading, in
particular for systems that use UltraSPARC T1 processors. This article discusses the following topics:
- Processor physical characteristics
- Performance characteristics
- Classes of commercial applications
- Assessing performance on UltraSPARC T1 processor-based systems
- Scaling applications with chip multithreading
- Tuning for general performance
- Accessing the modular arithmetic unit and encryption framework
- Minimizing floating-point operations and VIS instruction
This article has been updated from the original December 2005 publication to include important information about Cooltools, a set of
tools created to improve the ease of use of UltraSPARC T1 systems. Thee tools encompass a wide range including development, debugging,
tuning and deployment of applications.
Note: This article is available in PDF Format only.
Questions/comments for this article? Ask/tell us.
to the top |
back to Home |
download PDF format
|