In this blog entry I am going to speak about a recent tuning exercise on an application running on the [tag]Java Hotspot VM[/tag] 1.4.2. It’s great that developers are adopting and using the latest versions of the JVMs but there are still a hell of a lot of applications out in the wild running on older versions of the JVM.
The application in question is a CRM type application, serving about 200 concurrent users, increasing in bursts to 300+ users, processing approximately 4.5 million customer contacts per year. Do the math, and considering the application is running on a single JVM, the application is processing quite an amount of consistent load over the course of a working day.
Recently it was decided to determine the capacity of the current production application server. The rollout of another considerable application is imminent and the customer felt that it was prudent to determine whether or not the current infrastructure had sufficient capacity to run the new system.
To determine the current stress levels on the production application server, we installed [tag]Quest Software’s PerformaSure 4.4[/tag]; a tool that we have used successfully in the past to determine application bottlenecks and give an invaluable insight into the runtime characteristics our production applications.
Over the course of a week, we monitored the CRM application during the core business hours of 7am -> 6pm. It is worth noting that the users reported that they would routinely experience application slowdown during the course of any given day. PerformaSure provides a number of different browsers that provide different view points into the performance of an application. Normally, if you suspected that application slowdown was code related, you would focus on the Request Browser; this browser will breakdown any recorded request layer by layer and component technology by component technology. However, considering the fact that our focus is on tuning the VM and not the application code, we are going to focus on PerformaSure’s Metric Browser.
There are a number of generated graphs that we are going to look at to get a feel for how garbage collection is being managed, and its associated overhead, before tuning.
This first graph shows the percentage of free heap during the course of the day. At no point do we see the graph hit 0% available, so we can safely assume that any slowdown is not due to loitering objects:
Next we want to look at the garbage collection rate broken down by generation. Remember that Suns’ JVMs (1.3+) are generational and that the heap is broken into a ‘young generation’ and an ‘old generation’ according to object ‘age’. By default, two different GC algorithms are used to manage each of the two generations; the young generation uses a Copying Collector and the old generation uses a Mark Compact Collector. The specifics of these and other GC algorithms mentioned in this blog can be found on Sun’s excellent Turbo-charging Java HotSpot Virtual Machine article.
The green line above represents the New Generation GC rate and the pink line represents the Old Generation GC rate. The graph can be read as follows; the x-axis is time and the y-axis is GC/second. In the above case we can see a rate of 300 being reached at one point in the graph. This can be a little confusing. What this means is that, during this time slice, the minor GC rate was .3/second (read as thousandths/second on the graph or 300/1000 of a GC/second). This measurement may change to count/second if your GC rate is greater than 1 minor GC/second.
Okay, what can we see from this graph? Well, remember our customer indicated that they sometimes experience slowdown in the application. During this particular day, the slowdown occurred at approximately 14:30 local time. You can use the Zonar functionality in PerformaSure to zoom into this particular time frame, but we can clearly see a radical change in Old Generation GC activity in, or around, this time period.
How can we tell how much impact this increased Old Generation GC activity had on the application? Well, if we correlate the data that we now know with another PerformaSure graph,; the GC Overhead graph, we can determine the impact of the increased GC activity:
We can clearly see the overhead increasing in tandem with the Old Generation GC rate, eventually reaching a level of 100% overhead. Referring the the Active Session Count graph (see below) we can see that the increase in GC activity and the associated overhead actually coincides with the coming online of another user community, increasing the concurrent number of active sessions from approximately 200 to 250 and depending on circumstances and workload, sometimes 300 sessions.
Now that we have diagnosed the cause of the slowdown, what can we do to fix it? Let’s start by looking at the JVM settings:
-Xms512M -Xmx1408M -XX:MaxPermSize=128M
Nothing out of the ordinary here. The application is simply allocated 512M of Heap on startup, growing to a maximum of 1.4Gb with a Permanent Generation Size of 128M (the Permanent Gen is not of concern here so we won’t be tweaking this value). As we confirmed earlier, the application does not have, or exhibit the signs of, loitering objects. So for now, the amount of memory allocated is not of concern. Of course it could be tweaked, but lets get the application performing consistently before we go doing that.
What we want to focus on are the [tag]Garbage Collectors[/tag] used by the VM. I mentioned earlier that the New Gen uses a Copying Collector and the Old Gen uses a Mark Compact Collector. So lets play around with these and see what the results are. Now, it doesn’t make sense, or is it wise, to blindly make changes to a VM’s GC model, so put some thought into any settings, and have an idea of their desired impact, before you apply them. The actual impact can later be confirmed using PerformaSure.
The graphs that we have referred to are only a sample from about two weeks worth of recordings. So based on these and other recordings, I decided to change/add the following to the VM options:
-Xms1408M -Xmx1408M -XX:NewSize=448M -XX:MaxNewSize=448M -XX:SurvivorRatio=128 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 -XX:CMSInitiatingOccupancyFraction=60 -XX:MaxPermSize=128M
That’s quite a lot of VM options. So what have we done here?
Firstly, we’ve set the min and max values for the overall heap and New Generation heap to the same value; -Xms1408M -Xmx1408M -XX:NewSize=448M -XX:MaxNewSize=448M. This means that the VM will not consume extra system resources when it needs to scale up the amount of heap allocated. This is not strictly necessary for what we are trying to achieve but is good practice in a production system.
Next we specify our Survivor Ratio; -XX:SurvivorRatio=128. This tells the VM to allocate a very, very small amount of New Gen heap to the To and From Survivor Spaces. This option ties in with the -XX:MaxTenuringThreshold=0 option; this option controls the copying frequency of objects between the survivor spaces before being promoted to the Old Generation. Together, these have the net effect of, if after a minor GC, an object is still live, it will automatically get promoted to the Old Generation.
Next up, we specify two different collectors to be used by both the New Generation and Old Generation; -XX:+UseParNewGC -XX:+UseConcMarkSweepGC. These specify the use of a Parallel Collector in the New Gen and a Concurrent Mark-Sweep Collector in the Old Gen. The specific behaviour of these different collectors can again be found in the turbo charging article.
Our final option of interest is -XX:CMSInitiatingOccupancyFraction=60. This option basically sets a threshold on the percentage of used memory in the Old Gen that is allowed to be reached before a CMS collection is invoked.
So what has been the overall effect of our changes on the GC characteristics of our application?
In this recording we can see the usual increase in the number of active sessions, in fact in this example the session count increases to 321:
But instead of the normal dramatic increase in Old Gen GC activity, we see a brief spike before the Old Gen GC rate settles back into the uniform pattern that we had been seeing during the recording period:
And what about the GC overhead? Well from the below diagram we can see that the Old Gen GC activity now adds a consistent load of ~22% on the application, even during the burst of activity that we saw when the session count increased from ~200 to ~320 sessions:
What does this mean for the application?
What we have here is a trade off. By specifying the use of the CMS Collector for the Old Gen, or collector is running mostly concurrent with our application, instead of the stop-the-world approach of the default Mark-Compact Collector where user requests would be put on hold during a collection. The average overhead has increased but not to the point where our SLAs cannot be met.
The bottom line is that we now have an application that can scale to meet additional load. Where users were previously experiencing slowdown due to increased load, they now experience consistently high response times, almost irrespective of increases in load.
I would also like to mention now that we have consistent performance and GC activity during the course of a working day, we are continuing tweak the JVM settings; more specifically the amount of Heap allocated at runtime. I feel that 1.4Gb is excessive and based on the percentage of free Heap available during the day, the expected load and GC activity, with continued analysis we can squeeze a bit more performance from the application.
So the next time your application experiences slowdown, don’t jump the gun and the go tearing at the application code. Observe your application’s GC characteristics and determine whether or not it is in fact your application code that is the cause of the slowdown.
Following a comment by Dmitri , I thought it a good idea to post some graphs that give a more complete picture of the before and the after of this tuning effort.