35
votes

I am running an application that creates and forgets large amounts of objects, the amount of long existing objects does grow slowly, but this is very little compared to short lived objects. This is a desktop application with high availability requirements, it needs to be turned on 24 hours per day. Most of the work is done on a single thread, this thread will just use all CPU it can get its hands.

In the past we have seen the following under heavy load: The used heap space slowly goes up as the garbage collector collects less than the amount of memory newly allocated, the used heap size slowly grows and eventually comes near the specified max heap. At that point the garbage collector will kick in heavily and start using a huge amount of resources to prevent going over the max heap size. This slows the application down (easily 10x as slow) and at this point most of times the GC will succeed to clean up the garbage after a few minutes or fail and throw an OutOfMemoryException, both of them are not really acceptable.

The hardware used is a quad core processor with at least 4GB of memory running 64 bit Linux, all of that we can use if needed. Currently the application is heavily using a single core, which is using most of its time running a single core/thread. The other cores are mostly idle and could be used for garbage collection.

I have a feeling the garbage collector should be collecting more aggressively at an early stage, well before it runs out of memory. Our application does not have any throughput issues, low pause time requirements are a bit more important than throughput, but far less important than not getting near the max heap size. It is acceptable if the single busy thread runs at only 75% of the current speed, as long as it means the garbage collector can keep up with the creation. So in short, a steady decrease of performance is better than the sudden drop we see now.

I have read Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning thoroughly, which means I understand the options well, however I still find it hard to chose the right settings as my requirements are a bit different from what is discussed in the paper.

Currently I am using the ParallelGC with the option -XX:GCTimeRatio=4. This works a bit better than the default setting for time ratio, but I have a feeling the GC is allowed to run more by that setting than it does.

For monitoring I am using jconsole and jvisualvm mostly.

I would like to know what garbage collection options you recommend for the above situation. Also which GC debug output can I look at to understand the bottle neck better.

EDIT: I understand a very good option here is to create less garbage, this is something we are really considering, however I would like to know how we can tackle this with GC tuning, as that is something we can do much more easily and roll out more quickly than changing large amounts of the source code. Also I have ran the different memory profilers and I understand what the garbage is used by, and there by I know it consists of objects that could be collected.

I am using:

java version "1.6.0_27-ea"
Java(TM) SE Runtime Environment (build 1.6.0_27-ea-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b03, mixed mode)

With JVM parameters:

-Xmx1024M and -XX:GCTimeRatio=4 

Edit in reply to Matts comments: Most memory (and cpu) goes towards constructing objects that represent the current situation. Some of these will be discarded right away as the situation changes rapidly, some others will have a medium life time if no updates come in for a while.

4
Consider posting all of the VM arguments you're using now.jtoberon
I'm using Java 6, the arguments are ony -Xmx1024M and -XX:GCTimeRatio=4 (Jvm detects itself as a server and uses the parallel GC). The application will also run mostly on 200M (actually it seems to run a bit better as it will be triggered to clean up sooner and then it has less of a flooding of work).Thirler
can you define "medium life time" in "number of young collections" terms? it sounds like "if no updates come in for a while" implies that the rate of object allocation slows dramatically during this period in which case the interval between young collections should extend correspondingly. If so the same (small) MTT may be fine.Matt

4 Answers

24
votes

You don't mention which build of the JVM you're running, this is crucial info. You also don't mention how long the app tends to run for (e.g. is it for the length of a working day? a week? less?)

A few other points

  1. If you are continually leaking objects into tenured because you're allocating at a rate faster than your young gen can be swept then your generations are incorrectly sized. You will need to do some proper analysis of the behaviour of your app to be able to size them correctly, you can use visualgc for this.
  2. the throughput collector is designed to accept a single, large pause as opposed to many smaller pauses, the benefit is it is a compacting collector and it enables higher total throughput
  3. CMS exists to serve the other end of the spectrum, i.e. many more much much smaller pauses but lower total throughput. The downside is it is not compacting so fragmentation can be a problem. The fragmentation issue was improved in 6u26 so if you're not on that build then it may be upgrade time. Note that the "bleeding into tenured" effect you have remarked on exacerbates the fragmentation issue and, given time, this will lead to promotion failures (aka unscheduled full gc and associates STW pause). I have previously written an answer about this on this question
    1. If you're running a 64bit JVM with >4GB RAM and a recent enough JVM, make sure you -XX:+UseCompressedOops otherwise you're simply wasting space as a 64bit JVM occupies ~1.5x the space of a 32bit JVM for the same workload without it (and if you're not, upgrade to get access to more RAM)

You may also want to read another answer I've written on this subject which goes into sizing your survivor spaces & eden appropriately. Basically what you want to achieve is;

  • eden big enough that it is not collected too often
  • survivor spaces sized to match the tenuring threshold
  • a tenuring threshold set to ensure, as much as possible, that only truly long lived objects make it into tenured

Therefore say you had a 6G heap, you might do something like 5G eden + 16M survivor spaces + a tenuring threshold of 1.

The basic process is

  1. allocate into eden
  2. eden fills up
  3. live objects swept into the to survivor space
  4. live objects in from survivor space either copied to the to space or promoted to tenured (depending on tenuring threshold & space available & no of times they've been copied from 1 to the other)
  5. anything left in eden is swept away

Therefore, given spaces appropriately sized for your application's allocation profile, it's perfectly possible to configure the system such that it handles the load nicely. A few caveats to this;

  1. you need some long running tests to do this properly (e.g. can take days to hit the CMS fragmentation problem)
  2. you need to do each test a few times to get good results
  3. you need to change 1 thing at a time in the GC config
  4. you need to be able to present a reasonably repeatable workload to the app otherwise it will be difficult to objectively compare results from different test runs
  5. this will get really hard to do reliably if the workload is unpredictable and has massive peaks/troughs

Points 1-3 mean this can take ages to get right. On the other hand you may be able to make it good enough v quickly, it depends how anal you are!

Finally, echoing Peter Lawrey's point, you can save a lot of bother (albeit introducing some other bother) if you are really rigorous about object allocation.

4
votes

The G1GC algorithm, which has been introduced with stable Java 1.7 is doing well. You have to just specify maximum pause time you want to live with in your application. JVM will take care of all other things for you.

Key parameters:

-XX:+UseG1GC -XX:MaxGCPauseMillis=1000 

There are some more parameters to be configured. If you are using 4 GB RAM, configure region size as 4 GB/2048 blocks, which is roughly 2 MB

-XX:G1HeapRegionSize=2  

If you have 8 core CPU, fine tune two more parameters

-XX:ParallelGCThreads=4 -XX:ConcGCThreads=2 

Apart from these parameters, leave other parameters values to default like

-XX:TargetSurvivorRatio etc.

Have a look at oracle website for more details about G1GC.

-XX:G1HeapRegionSize=n

Sets the size of a G1 region. The value will be a power of two and can range from 1MB to 32MB. The goal is to have around 2048 regions based on the minimum Java heap size.

 -XX:MaxGCPauseMillis=200

Sets a target value for desired maximum pause time. The default value is 200 milliseconds. The specified value does not adapt to your heap size.

-XX:ParallelGCThreads=n

Sets the value of the STW worker threads. Sets the value of n to the number of logical processors. The value of n is the same as the number of logical processors up to a value of 8.

If there are more than eight logical processors, sets the value of n to approximately 5/8 of the logical processors. This works in most cases except for larger SPARC systems where the value of n can be approximately 5/16 of the logical processors.

-XX:ConcGCThreads=n

Recommendations from oracle:

When you evaluate and fine-tune G1 GC, keep the following recommendations in mind:

  1. Young Generation Size: Avoid explicitly setting young generation size with the -Xmn option or any or other related option such as -XX:NewRatio. Fixing the size of the young generation overrides the target pause-time goal.

  2. Pause Time Goals: When you evaluate or tune any garbage collection, there is always a latency versus throughput trade-off. The G1 GC is an incremental garbage collector with uniform pauses, but also more overhead on the application threads. The throughput goal for the G1 GC is 90 percent application time and 10 percent garbage collection time.

Recently I have replaced CMS with G1GC algorithm for 4 GB heap with almost equal division of young gen & old gen. I set the MaxGCPause Time and results are awesome.

3
votes

You can try reducing the new size. This will case it to make more, smaller collections. However it can cause these short lived objects to be passed into tenured space. On the other hand you can try increasing the NewSize which means less objects will pass out of the young generation.

My preference however is to create less garbage and the GC will behave in a more consistent manner. Instead of creating objects freely, try re-using them or recycling objects. You have to be careful this this doesn't cause more trouble than its worth, but you can reduce the amount of garbage created significantly in some applications. I suggest using a memory profiler e.g. YourKit to help you identify the biggest hitters.

An extreme case is to create so little garbage it doesn't collect all day (even minor collections). It possible for a server side application (but may not be possible for a GUI application)

2
votes

The first VM options I'd try are increasing the NewSize and MaxNewSize and using one of the parallel GC algorithms (try UseConcMarkSweepGC which is designed to "keep garbage collection pauses short").

To confirm that the pauses you're seeing are due to GC, turn on verbose GC logging (-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps). More info about how to read these logs is available online.

To understand the bottleneck, run the app in a profiler. Take a heap snapshot. Then, let the app do its thing for a while. Take another heap snapshot. In order to see what's taking up all the space, look for whatever there are a lot more of after the second heap snapshot. Visual VM can do this, but also consider MAT.

Alternatively, consider using -XX:+HeapDumpOnOutOfMemoryError so that you get a snapshot of the real problem, and you don't have to reproduce it in another environment. The heap that's saved can be analyzed with the same tools -- MAT etc..

However, you may be getting an OutOfMemoryException either because you have a memory leak or because you're running with too small a max heap size. The verbose GC logging should help you answer both of these questions.