Customer Snapshot: Education

Texas Advanced Computing Center (TACC)

Advanced Computing Center Uses Sun Technologies and Services to Build One of the World’s Fastest HPC Systems

The Texas Advanced Computing Center (TACC) at The University of Texas at Austin provides one of the world’s most powerful computing resources to facilitate research in all areas of open science. The center’s employees help to manage a variety of high-performance computing (HPC) resources and services used by scientists around the world.

Customer Challenges

  • Build one of the most powerful HPC systems in the world
  • Support the widest portfolio of scientific applications
  • Provide for application scalability and ease of management

Solution

TACC collaborated with Sun, other universities, and the open-source community to build and support an HPC system with a $59 million grant from the National Science Foundation. Built on Sun technologies and an open-source software stack, the HPC system (called Ranger) includes 3,936 nodes that contain 123 TB of memory and has a peak speed of 579.4 teraflops.

Business Results

  • Created one of the world’s largest and most powerful general purpose, open access HPC systems
  • Providing more than 500 million processor hours each year in production
  • Attracting more than 1,000 users and more than 200 research projects

Story Details

Cancer, climate change, earthquakes, quantum mechanics, and near-space objects — these are just a few of the areas of study for which scientists need more computational power to facilitate their research. In 2005, the Texas Advanced Computing Center (TACC) at The University of Texas at Austin, decided to answer a call from the National Science Foundation (NSF) to build a high-performance computing (HPC) system for the TeraGrid to support all areas of open science. Through coordinated policy, grid software, and high-performance network connections, the TeraGrid, an NSF-sponsored initiative, integrates a distributed set of high-capability computational, data-management and visualization resources to make research more productive.

The new system would have to significantly outperform the country’s existing systems, most clocked at less than 10 teraflops. Along with reaching near-petascale performance, the architecture would also need to be manageable, scalable, and flexible enough to process and store data from an extremely diverse portfolio of scientific applications.


" The Sun Datacenter Switch integrates InfiniBand technology at an unprecedented scale allowing people to run across the entire system to achieve exceptional capability and run applications at a larger scale than they’ve ever run before. "
— Tommy Minyard , Associate Director of Advanced Computing Systems, TACC

After extensive research, TACC chose to build its system on Sun technologies. In addition to providing the necessary vision and flexibility to work with an open-software stack, Sun helped TACC design a solution that uses two foundational components of the Sun Constellation System, the world’s first open petascale computing environment. These components include the Sun Blade 6048 Modular System with four-socket AMD quad-core Opteron processors and the Sun Datacenter Switch 3456, which provides 3,456 ports. “The port count of the Sun Datacenter Switch 3456 gave us the capability to build a much larger system that was also manageable,” says Jay Boisseau, director of TACC. “Combined with the four-socket Sun Blade server modules, we could put together a Linux-based, x86-instruction-set architecture that could fit in our datacenter, could scale up to deliver staggering computational power, and could house a huge amount of aggregate memory.”

The proposed architecture was a success. In August 2006, the NSF awarded TACC $59 million to build its system, called Ranger. Over the next two years, TACC worked with numerous people from the open-source community, universities, AMD, Mellanox Technologies, and Sun. “Key people from different places within Sun — a kind of uber HPC team — worked with us during the deployment phase and were really instrumental in pulling this thing off,” says Karl Schulz, associate director of HPC at TACC.

Eighty-two Sun Blade 6048 Modular Systems each house 48 Sun Blade server modules, for a total of 3,936 servers and 15,744 processors — the most capable HPC resource for open science research in the nation. Two Sun Datacenter Switch 3456 components connect to the blades through Sun Blade 6048 InfiniBand Switched Network Express Modules. System management tools such as the Sun Grid Engine 6.2 and Sun xVM Ops Center software run on Sun Fire X4600 M2 servers. All metadata is stored on a Sun StorageTek 6540 array with more than 8 TB of capacity.

Researchers who use Ranger can make use of 1.7 PB of open storage supported by 72 Sun Fire X4500 servers, which run the Lustre parallel file systems. Long-term archival storage is supported by a Sun StorageTek SL8500 modular library system and StorageTek T10000 tape drives that can scale to support up to 10 PB in one 10,000 slot library using Sun StorageTek QFS with the Sun StorageTek Storage Archive Manager software. All of Ranger’s Sun technologies are supported by two onsite consultants from Sun Professional Services — and are protected by SunSpectrum Platinum support.

In February 2008, Ranger went into production. It offers one of the largest memory systems in the world with 123 TB and speeds up to 579.4 teraflops — which also makes it one of the world’s fastest supercomputers for open-science research. Every year it is in production, Ranger will be able to contribute up to 500 million processor hours, which equates to an average dual-core desktop running nonstop for 100,000 years.

By July 2008, the HPC system was generating more than 200 TB of data per month. To help researchers better process the vast amounts of information, TACC has built a visualization subsystem for Ranger powered by eight Sun Fire servers and 32 NVIDIA Quadro Plex visual computing systems. “You can’t just pull terabytes of data down to your laptop or your home computer,” concludes Tommy Minyard, associate director of advanced computing systems at TACC. “So we will provide an end-to-end HPC facility where you can do your scientific computing, but also pre- and post-processing via remote visualization. I think that capability will really help separate TACC from other HPC centers and allow researchers to get an excellent return on investment.”

  
 
 
Interested in Sun's Open Storage?
Download this paper today to learn about the tools, trends and key features of Sun's Open Storage solutions.