Sun Inner Circle: For Business & Technology Leaders Sun Inner Circle: For Business & Technology Leaders

A Supercomputer in Your Datacenter


Sun brings production-ready HPC to the mainstream enterprise

High Performance Computing (HPC) has long been vital to the success of education and research institutions, but as the need for complex computing increases across a range of industries, more and more organizations are considering HPC a core tool. For example:

  • Engineers at manufacturing companies who need to run computationally complex simulations on a new design
  • Financial analysts who want to run more and faster “real-time” risk analyses to optimize profits
  • Geophysicists trying to identify productive oil and gas fields and determine the most cost-efficient way to extract resources
  • Life sciences scientists who need faster, more cost-effective DNA sequencing or drug discovery

How do you know if you're ready for HPC in a production environment? How can Sun help you get there?

Sun Inner Circle sat down with Bjorn Andersson, director of HPC and integrated systems at Sun, to learn about how Sun is making HPC available to the enterprise datacenter, and the recent announcement of the Sun Constellation System, the world's first open petascale architecture at the International Supercomputing (ISC) show in Germany this June.

INNER CIRCLE: What does Sun mean by production-ready HPC?

ANDERSSON: Production-ready HPC is a way to provide our customers with the supercomputing capabilities academic and scientific researchers traditionally have enjoyed — while cutting down on the headaches and costs of running these complex systems. It means providing the availability, reliability, and security that Sun customers expect — in an HPC architecture.

The objective is to make HPC simpler, more affordable, and more easily deployed by enterprises across a range of industries, with an architecture that may be extended from single racks to a large-scale supercomputer with incredible computational speeds.

IC: What does the architecture consist of?

ANDERSSON: Our approach is to view the whole cluster as a system, meaning that for example individual servers or the software used are components of the larger system. Key components of our HPC architecture include Sun x64 servers and storage, a new Sun InfiniBand switch, the Solaris 10 Operating System, and Sun Grid Engine for workload management throughout a cluster. Couple this with our HPC Quick Start Services offering and ability to factory configure and deliver ready-to-run solutions, which minimize the cost and time to get an HPC environment up and running.

Sun holds the record for a midrange HPC installation — under two days, from delivery trucks arriving to a 512-node cluster running in production — at Mississippi State University.

IC: How long does it take to get a Sun HPC system up and running?

ANDERSSON: For high-end HPC systems, the industry standard is several months of gradually coming up to production mode. In the midrange, it's usually several weeks for a complete install. But Sun consistently beats these averages and is delivering production-ready HPC in record time.

On the high end, Sun installed a very large cluster serving over 10,000 students at Tokyo Tech about a year ago — the cluster was up and running in a month. In the midrange, Mississippi State University is a great example where Sun holds the record for an HPC installation — under two days from delivery trucks arriving to a 512-node cluster running in production.

IC: What are the signs that an enterprise is ready for HPC? Isn't this kind of computing really still the domain of academia and research?

ANDERSSON: Ultimately, business needs drive HPC. Modeling and solving more and more complex problems is part of doing business for a lot of organizations. I find it interesting that it's not just what many think of as the usual suspects — automotive, energy, and pharmaceuticals — looking at HPC these days. Financial services and insurance are more and more turning to HPC for complex analyses of risks and stock portfolios. The entertainment industry is looking to HPC for rendering animated movies. And for companies in every vertical, HPC provides a scalable platform to optimize business processes.

Once the business need is established, the enterprise needs to examine how HPC answers datacenter checklist items such as cost, scalability, and manageability — along with power and space requirements.

Traditionally, the barrier to start to use HPC has been fairly high. But with new standards-based HPC systems, companies in a variety of industries have HPC within their reach and can get started now. Because HPC gets products to market faster, it helps companies get ahead of the competition. HPC is fast becoming essential to their business and their design and analysis processes. HPC is as crucial to their operation as a welding robot is to a car manufacturer.

 
Ready for HPC?
Contact your sales rep for an HPC presentation

Get Sun's HPC Newsletter
Stay on top of HPC technologies and developments

Sun Customer Ready Cluster Configuration Tool

IC: What kinds of computer power is Sun offering in HPC?

ANDERSSON: It depends on the situation. We can do small installations, and we can push the limits of what is physically possible. Right now, the Texas Advanced Computing Center (TACC) at the University of Texas at Austin is deploying a cluster composed of the new Sun Constellation system and standard Sun components for over 500 TeraFLOPs of power — 500 trillion calculations a second.

A more likely candidate for a commercial datacenter might be a single rack or two of the servers TACC is using, or a Sun Fire X4600 M2 server with a quarter TB of memory in a compact 4U space. Both sorts of implementations are based on standard components and open interfaces that allow scaling from a small rack all the way up to petascale supercomputer performance.

IC: How does this kind of performance fit with what most enterprises are looking for in HPC?

ANDERSSON: Scalability is something Sun customers expect — and they can expect the same with HPC. It's important to understand that HPC is moving beyond its roots in education and research. Most companies investing in these systems want high performance and reliability at low cost, as well as systems that can be installed quickly and serviced like other IT investments. Fortunately, Sun has decades of experience in these areas.

IC: Are there power and cooling advantages to using Sun servers as the backbone of an HPC system?

ANDERSSON: Sun x64 and blade systems are extraordinarily energy efficient with design features that help ensure no dialing down of CPU speed to meet temperature envelopes, so Sun customers get both speed and energy efficiency. In addition, our blade servers provide new levels of energy efficiency over most rackmounted systems. For HPC environments, Sun provides a 48-blade per rack configuration — which, in conjunction with Sun Grid Engine workload management, can boost utilization up to 98 percent.

IC: How else does Sun reduce the complexity of running HPC environments inside the traditional datacenter?

ANDERSSON: Sun Grid Engine 6.1 helps reduce the time and expense of HPC management by distributing workloads across multiple machines and HPC grids. Essentially, Grid Engine balances the work to be done with the compute resources available so projects execute quickly without idling or overloading machines.

As for configurability, Sun Grid Engine allows scripts to be plugged in and behaviors to be overridden as the user wishes, thanks to its distributed resource manager (DRM). This DRM was one of the reasons that TACC chose Sun Grid Engine to manage its HPC infrastructure. Plus, the API for job submission, monitoring, and control is language-agnostic, which allows developers to write applications that integrate with a supercomputing grid and are portable among other DRM APIs.

IC: How does Sun Grid Engine manage multiple clusters?

ANDERSSON: Because organizations often end up requiring more compute power, it makes little sense to add HPC to the datacenter if manageability becomes increasingly difficult as a cluster scales. Grid Engine allows all clusters to be guided by a single master policy, which draws virtual lines between the machines. This can also help ensure that the most important projects continue to get priority as the need for HPC within an organization grows.

IC: How does Sun Grid Engine compare with competing products on the market?

ANDERSSON: It's a question of support and cost. There is no other productized open source equivalent, and the proprietary competition with fewer features costs several times more than Sun Grid Engine. Grid Engine is licensed for large numbers of CPUs, while the proprietary competition sells its products on a per-core basis. Plus, some of the features in Grid Engine critical to enterprise and utility computing are not available in competing products. These features include making accounting information available via a SQL database and providing an overview of grid activities with simple queries.

 

IC: Where does the Solaris OS fit into the Sun HPC architecture?

ANDERSSON: Today, we deliver many education and research HPC environments using Linux, but Solaris is simply a better fit for a production environment. It's designed for managing node complexity and latency such as in a supercomputing environment. When you need to support blades with four processors — and each of these processors has four cores — the complexity of managing all 16 cores increases.

When there were only two CPUs to worry about, there was a 50 percent chance that memory was going to be attached to the right CPU performing a task, which usually resulted in good performance. But with multiple processors and cores, all bets are off. With its memory placement optimization feature, Solaris ensures that the right processor takes care of the right job, and this increases HPC efficiency and decreases latency.

IC: How does Solaris compare to Linux in scalability in an HPC environment?

ANDERSSON: Solaris is battle proven and has many years of demonstrated ability to scale in higher end multiprocessor systems that really pays off in these new multicore systems. In HPC, it's very important to focus on the bandwidth to floating point operations ratio, to keep the processors fed with data and have as little overhead as possible. For example, with its virtual memory capabilities, Solaris also supports up to a 1 GB page size, while Linux is limited to an 8000-byte page size. This enables Solaris to more efficiently handle the amount of data that HPC applications expect. In installations such as TACC, the operating system controls the InfiniBand switch fabric managers, the compute nodes, and the storage. The Sun HPC architecture can easily run on Linux, but all the advanced features of Solaris 10 are available at no cost at OpenSolaris.org.

 

IC: Storage is a key component of any supercomputing cluster. How does the Sun HPC architecture stack up in this area?

ANDERSSON: One unique option for HPC storage is Sun's data server. The Sun Fire X4500 server with 1TB disks can provide nearly half a petabyte of storage in single rack — and at the TACC installation, this server will be used to deliver 1.7 petabytes of storage. That amount of storage can be exceeded, too. The European Organization for Nuclear Research is using over 100 Sun Fire X4500 servers to store over 2.5 petabytes of data. Many enterprises will be satisfied with one or two of these servers, each with 24 to 48 TBs of data in a 4U rack space. In addition, Sun provides a complete storage solution from the very high performance cluster connected storage, to enterprise data and secure tape archive solutions.

I should also add that Solaris plays a big role as the operating system for storage, because in these enormous environments, it's just too difficult to put a petabyte or more of data onto a storage area network. For example, TACC is able to put storage directly on the InfiniBand network using Solaris as the platform to run this storage server.

IC: How does Sun envision scalability of its HPC architecture at the high end?

ANDERSSON: Right now, some of the largest supercomputers according to the Top 500 list are built with lots of relatively slow processors, small memory per node, and proprietary interconnects. Compare this with the Sun Constellation System, in which we're using the fastest industry standard processors available and have an industry leading amount of memory per node. Plus we use industry standard high bandwidth and low latency interconnects.

This enables us to take full advantage of industry wide investments and ride the cost curve of commodity components, and it provides choice for our customers. We are bringing system level innovation to market with the Sun Constellation System. Innovations that have the complete cluster as the design point and are focused on true scaling, from within a single rack at one TeraFLOP or below to well above one PetaFLOP, a more than 1000x scaling factor within the same compatible architecture.

IC: Why would an enterprise be interested in petascale HPC?

ANDERSSON: Petascale computing is at the bleeding edge of most companies' requirements today, but in a few years, it's almost inevitable to be commonplace. The need to model and solve more complex problems is part of doing business for a lot of organizations, and I'm hard-pressed to see this momentum slowing down. Competitive pressure between companies will rather accelerate this trend. I do think that a scalable path to more compute power is something that will be crucial to many enterprises in the future.