BigAdmin System Administration Portal
Feature Article
Print-friendly VersionPrint-friendly Version

Case Study: Web Application Tuning on Chip Multithreading Platforms

Wynne Wang, September 2009

Chip multithreading (CMT) is more and more popular today. A CMT processor can execute many threads simultaneously, which increases its efficiency because wait latencies are minimized.

Sun began the study of CMT many years ago, and its CMT series includes UltraSPARC T1, T2, and T2 Plus processors. This series is often regarded as the highest-throughput and most eco-responsible processors ever created. The processors in this series have 32 or more virtual CPUs in one CMT chip. So a slip in the performance of a non-CMT platform might become a bottleneck in the CMT environment.

This article discusses such a case and possible solutions, and it includes the following sections:

Introduction

A leading Web 2.0 independent software vendor (ISV) in China was looking for an IT solution for their increasing business. The company is based in Shanghai and provides online games for millions of players. At the peak time, there are 1 million online players simultaneously. Throughput and performance were the key considerations for a new hardware platform.

At first, the company chose x86/x64 servers and Linux. After being introduced to the benefits of CMT, they agreed to adopt a CMT platform. However, they first wanted to conduct a performance test for the web blog system, which provides an information center with a blog and portal features for the players. This system is accessed by millions of players a day.

The blog system was built on an open source stack with Apache Struts, SpringSource Spring, Apache Tomcat, and Apache iBATIS. The company used a Sun Fire T2000 server as the hardware platform for the performance test. (The Sun Fire T2000 server is equipped with an UltraSPARC T1 CPU, which is CMT-enabled. There are eight cores on the socket, and a single core can execute four tasks at the same time. So there are 32 virtual CPUs on the chip.)

After deploying the software stack, we used ApacheBench to generate the test load. At first, the performance did not appear good: The CPU usage was only 10%. We didn't find any evidence of a shortage of network bandwidth, lack of memory, or heavy activities on disk I/O, but the average access time was 3.3 seconds for the test case, which seemed a bit too long. The issue seemed to be that the blog application didn't make full use of the CMT chip, and we needed to tune the Java technology-based application.

Tuning JVM Options

First we checked the Java Virtual Machine (JVM) options, switched to the SERVER engine, chose a different garbage collection algorithm, increased the maximum memory option, and changed the YOUNG size.

Normally, these changes benefit performance, but this time, we got only a 10% gain. The average access time was 3 seconds for the test case.

Spring Framework Issue

Then we checked the application in detail. It had 240 threads to process the HTTP request, so there were enough workers. However, something was preventing the threads from working on the CPU. So we used jstack to print the stack trace.

Note: The stack trace includes every thread's stack and state, such as RUNNABLE, WAITING, and BLOCKED, which is helpful for debugging multithreaded applications in a CMT environment.

Since the CPU idle time was almost 85%, we were interested in the threads that were not working.

#jstack JAVA_PROCESS_PID |grep BLOCKED |wc -l
#jstack JAVA_PROCESS_PID |grep WAITING |wc -l
#jstack JAVA_PROCESS_PID |grep RUNNING |wc -l

The results looked like this.

BLOCKED  WAITING  RUNNABLE
 198        39         11

In this case, most threads were blocked. After some investigation, it seemed most threads were blocked in the getMergedBeanDefinition() method. This method is used to get a RootBeanDefinition for the given bean name by merging with the parent.

After we searched the knowledge base for the Spring framework, we found a similar bug, "AbstractBeanFactory.getMergedBeanDefinition has performance issues with lock contention when not going through getBean." (See Recommended References for documentation.)

According to the description, it seems those beans are not cached correctly, and then they cause a lock contention. The description also says Spring 2.5.2 was released with a fix for this issue. So we downloaded the newest Spring library and tested again.

The performance was better: It took 1 second for the average access time in the test case.

iBATIS Workaround

Then we ran jstack again. This time we got a lot of WAITING threads.

After investigating the stack, it seemed there was another lock inside iBATIS, an open source relational mapping tool for Java developers. (Note: The iBATIS source code and documentation are distributed under Apache License 2.0.)

We dug into the source code and found the following:

private static final Map CLASS_INFO_MAP =
        Collections.synchronizedMap(new HashMap());
public static ClassInfo getInstance(Class JavaDoc clazz) {
     synchronized (clazz) {
       ClassInfo cache = (ClassInfo) CLASS_INFO_MAP.get(clazz);
       if (cache == null) {
         cache = new ClassInfo(clazz);
         CLASS_INFO_MAP.put(clazz, cache);
       }
       return cache;		  }
	}

In the source code, CLASS_INFO_MAP was defined as a synchronized map, and there was also a synchronized lock in the subsequent logic. There are too many locks here.

The synchronized lock on clazz was to protect the data inside CLASS_INFO_MAP, but this map was designed to cache the class information. And it seemed unnecessary to make the subsequent accessing code transaction-safe. The synchronized map had become a bottleneck that prevented the other threads' access. So we commented out the synchronized (clazz) line.

synchronizedMap is not a thread-safe class; it allows only one thread access at a time. This limitation includes all the threads that might retrieve values from the map as well as threads that put a new value pair into it. Since get and set are atomic operations, how about making it a multithread-safe map?

Here we chose the ConcurrentHashMap class, which is a thread-safe implementation of a map that offers better concurrency than synchronizedMap. Multiple reads can almost always execute concurrently, simultaneous reads and writes can usually execute concurrently, and multiple simultaneous writes can often execute concurrently.

The result looked like this. (Note per Apache License: This source code has been modified.)

private static final Map CLASS_INFO_MAP =
       ConcurrentHashMap (new HashMap());
public static ClassInfo getInstance(Class JavaDoc clazz) {
       ClassInfo cache = (ClassInfo) CLASS_INFO_MAP.get(clazz);
       if (cache == null) {
         cache = new ClassInfo(clazz);
         CLASS_INFO_MAP.put(clazz, cache);
       }
       return cache;
   }
/*This source code has been modified*/

We modified the iBATIS source, compiled it, and ran it again.

Now the result seemed good. The average time was 0.3 second in the test case. That was 10 times faster. Plus, inside the stack trace, most threads were in the RUNNABLE state and CPU usage had come up to 95%. The ISV was happy with this result and ordered more CMT systems.

Conclusion

For web applications in CMT environments, the increased number of CPUs is an issue sometimes. Since Java technology scales well in a multithreaded environment, an application might reach hundreds of threads easily, and some code may become a bottleneck in the CMT environment. With the adoption of the Java concurrent library and tools, developers can find the bottleneck quickly and fix it.

About the Author

Wynne Wang is a technical consultant for the ISV engineering team at Sun. He provides software technical support and architecture design for partners. His focus includes the Solaris Operating System, databases, and Java technology.

Recommended References


For More Information

Here are additional resources.

General Sun Links


Comments (latest comments first)

Discuss and comment on this resource in the BigAdmin Wiki

Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License.


BigAdmin