Fast Track to Solaris 10 Adoption: 64-bit Performance
Performance Issues
Please click on a question below or download a pdf version.
- I can recall that one of Sun's goals for the Solaris 10 OS was no more entries in /etc/system. Have you achieved this?
- Will any of the new trussing capabilities allow a process to be trussed twice? For example, right now a Sun Cluster managed process cannot be trussed while it is under cluster control.
- Have the SunPro Business Unit Compilers for the Solaris 10 OS been released yet? What kinds of improvements over GCC can be expected in both the 32 and 64-bit spaces?
- Does the Solaris 10 OS make any performance or scalability improvements? For example, the current OS has a limit of 1024 file descriptors. Will the Solaris 10 OS be able to break through that limitation?
- For an app like BLAST, which step gives you the best increase in performance? a) move 32-bit app from 32-bit AMD to 64-bit AMD b) convert the app itself from 32-bit to 64-bit?
- Intel recently announced the cancellation of its 4 GHz Pentium. Do you see 4 GHz as the clock-rate ceiling for single CPUs? If so, where does the AMD Opteron processor go after hitting that ceiling? If not, why is it different?
- Had Sun run any performance comparisons between the Solaris 10 OS/AMD Opteron processor and Wintel machines running ProEngineer or Autocad? What were the results?
- Will an application that can fit into the 8MB of cache on Ultra SPARC III 1.2 GHz CPU run faster on AMD64 Opteron where the cache is small?
- What types of applications (CAD, Imaging, etc.) take real advantage of 64 bits?
- Since AMD64 is faster than SPARC, why not drop SPARC and put the savings into Solaris OS/services?
- Are there cases of a 32-bit app running slower on a 64-bit CPU?
- Are high-end servers coming with AMD64 and the Solaris 10 OS?
- Can you explain in a little more detail about the performance gains that AMD64 will have over SPARC64?
- Which has better performance in the Java environment, SPARC 64 bit or AMD64?
- Has any benchmarking been done for the Solaris 10 OS on AMD64 and Intel platforms?
 |
Q: I can recall that one of Sun's goals for the Solaris 10 OS was no more entries in /etc/system. Have you achieved this?
A: Some mechanism for patching kernel variables will always remain. However, we have purposely eliminated some of the more common reasons to mess with /etc/system, e.g., the shared memory and other IPC tuneables. These are now modeled as resource limits that can be tweaked on the fly (see rctladm(1M)), or even managed on the network via the project(4) database.
Back to top
Q: Will any of the new trussing capabilities allow a process to be trussed twice? For example, right now a Sun Cluster managed process cannot be trussed while it is under cluster control.
A: You still won't be able to truss a process twice; however, the issue with Sun Cluster will be addressed when Sun Cluster migrates to become a service management framework (SMF) enabled service.
Back to top
Q: Have the SunPro Business Unit Compilers for the Solaris 10 OS been released yet? What kinds of improvements over GCC can be expected in both the 32 and 64-bit spaces?
A: "SunPro Business Unit," or, officially, "Sun Studio 10 Software," will be released concurrently with the Solaris 10 OS, at the end of January 2005. Sun Studio 10 Software will have considerably better FP performance than gcc/g77 (as has Studio 9; it will just continue to improve). We will also have features that gcc doesn't have: OpenMP2.0, automatic parallelization/vectorization, linker scoping, performance libraries (libm and libsunperf), much better debugging support, performance analyzer and a full-featured IDE, etc. Sun Studio 9 Software already improves over gcc for the 32-bit space; Studio 10 will go further and bring all that technology to 64 bits.
Back to top
Q: Does the Solaris 10 OS make any performance or scalability improvements? For example, the current OS has a limit of 1024 file descriptors. Will the Solaris 10 OS be able to break through that limitation?
A: There are many, many performance improvements in networking, virtual memory, system calls, and filesystems. Some of that work was enabled by applying DTrace to the OS itself. The fd hard limit is 65535 in the Solaris 10 OS. Other scalability improvements include further MPO optimizations (NUMA), more physical memory on both SPARC and x86 machines, large pages, and more address space for 32-bit apps on 64-bit x86 machines.
Back to top
Q: For an app like BLAST, which step gives you the best increase in performance? a) move 32-bit app from 32-bit AMD to 64-bit AMD b) convert the app itself from 32-bit to 64-bit?
A: We've already tested based on a), and have seen performance increase. For b) we predict improvement based on ABI and calling conventions.
Back to top
Q: Intel recently announced the cancellation of its 4 GHz Pentium. Do you see 4 GHz as the clock-rate ceiling for single CPUs? If so, where does the AMD Opteron processor go after hitting that ceiling? If not, why is it different?
A: The short answer is, there is no "physically constrained" ceiling at 4GHz. The AMD Opteron processor is currently at 2.4GHz, with future products expected to ship into the 3.xGHz and perhaps 4.xGHz range. However, focusing solely on MHz is the wrong approach. There are many ways to improve performance.
For example, micro-architectural enhancements to improve per-core performance; multi-core solutions to improve multi-threaded, multi-tasking, and more throughput oriented applications; system architecture improvements (like the current AMD Opteron processor's integrated memory controller and HyperTransport interconnect) to focus on system bottlenecks; power optimizations to improve cost and density. That all being said, of course, MHz cannot be ignored.
Back to top
Q: Had Sun run any performance comparisons between the Solaris 10 OS/AMD Opteron processor and Wintel machines running ProEngineer or Autocad? What were the results?
A: Our ProEngineer results on the Solaris 10 OS/Opteron is more than 15 percent faster than on the Solaris 8 OS on 32-bit. Expect 64-bit to be faster than 32-bit. Comparison to Wintel is ongoing.
Back to top
Q: Will an application that can fit into the 8MB of cache on Ultra SPARC III 1.2 GHz CPU run faster on AMD64 Opteron where the cache is small?
A: Yes, but your mileage may vary. AMD has a fast memory subsystem and high clock rate. Depends on what the app is doing.
Back to top
Q: What types of applications (CAD, Imaging, etc.) take real advantage of 64 bits?
A: Both memory-intensive apps (database), because of increased memory capacity, and computationally intensive ones (media, CAD, imaging, simulation), because of relaxed register pressure and improved large integer multiply performance.
Back to top
Q: Since AMD64 is faster than SPARC, why not drop SPARC and put the savings into Solaris OS/services?
A: Performance depends a great deal on workload type. SPARC has many performance advantages in scalability and multithreaded workloads. The new multi-core SPARC processors will demonstrate order-of-magnitude performance gains.
Back to top
Q: Are there cases of a 32-bit app running slower on a 64-bit CPU?
A: No, none that we have seen.
Back to top
Q: Are high-end servers coming with AMD64 and the Solaris 10 OS?
A: The SPARC product line will continue to range from 2P entry servers to the high end 100+P servers. The AMD Opteron product line includes workstations and servers up to 4P today, with larger systems planned for the future.
Back to top
Q: Can you explain in a little more detail about the performance gains that AMD64 will have over SPARC64?
A: Our preliminary results show that AMD64 performance is 30-40 percent faster than 32-bit; our 32-bit Opteron processor is faster than SPARC.
Back to top
Q: Which has better performance in the Java environment, SPARC 64 bit or AMD64?
A: We predict AMD64 will have better JVM performance.
Back to top
Q: Has any benchmarking been done for the Solaris 10 OS on AMD64 and Intel platforms?
A: We have seen great performance on AMD Opteron processor-based systems with the Solaris 10 OS. 64-bit BLAST (bio-science benchmark) results:
- The Sun Fire V40z server with a single AMD Opteron 850 processor running 32-bit Solaris Operating System shows up to 42 percent better performance than a Dell 6650 server with a 3.0 GHz Xeon running Linux.
- The 64-bit Solaris 10 Operating System exhibits up to 31 percent faster performance than the 32-bit Solaris OS with only a recompilation of the application source code.
- The 64-bit Solaris 10 OS exhibits incredible scalability of 95 percent on 4-way system.
|