2
votes

Why would a java heap allocation resizing cause an OOME?

We are seeing OutOfMemoryExceptions in our logs, and they seem to coincide with the java heap commit size growing from ~1G to ~2.4G. Despite the error message, it does not appear that we are running out of heap space. Aside from the exception being thrown (and heap dump generated), the resize seems to eventually succeed and the application continues on without a problem (with a ~2.4G heap commit size).

Here's an example of the log output:

INFO   | jvm 1    | 2013/08/16 12:08:05 | [GC [PSYoungGen: 328000K->2997K(339200K)] 645686K->320683K(1038272K), 0.0101580 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
INFO   | jvm 1    | 2013/08/16 12:09:14 | [GC [PSYoungGen: 331509K->3487K(338816K)] 649195K->322153K(1037888K), 0.0115600 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:09:59 | [GC [PSYoungGen: 331999K->2928K(340032K)] 650665K->322608K(1039104K), 0.0099300 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:10:48 | [GC [PSYoungGen: 333104K->2723K(339648K)] 652784K->323240K(1038720K), 0.0100130 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:28 | [GC [PSYoungGen: 332885K->3884K(340864K)] 653402K->325089K(1039936K), 0.0106250 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:39 | [GC [PSYoungGen: 23694K->463K(340352K)] 344899K->323656K(2437504K), 0.0070330 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:39 | [GC [PSYoungGen: 463K->0K(340608K)] 323656K->323592K(2437760K), 0.0044440 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:39 | [Full GC
INFO   | jvm 1    | 2013/08/16 12:11:40 |  [PSYoungGen: 0K->0K(340608K)] [PSOldGen: 323592K->323592K(699072K)] 323592K->323592K(1039680K) [PSPermGen: 159297K->159297K(262144K)], 1.2076900 secs] [Times: user=1.20 sys=0.00, real=1.21 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:40 | [GC [PSYoungGen: 0K->0K(340736K)] 323592K->323592K(2437888K), 0.0046330 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:40 | [Full GC
INFO   | jvm 1    | 2013/08/16 12:11:42 |  [PSYoungGen: 0K->0K(340736K)] [PSOldGen: 323592K->279953K(744512K)] 323592K->279953K(1085248K) [PSPermGen: 159297K->159062K(262144K)], 1.7593100 secs] [Times: user=1.75 sys=0.00, real=1.76 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:42 | java.lang.OutOfMemoryError: Java heap space
INFO   | jvm 1    | 2013/08/16 12:11:42 | Dumping heap to java_pid28908.hprof ...
INFO   | jvm 1    | 2013/08/16 12:11:48 | Heap dump file created [463314899 bytes in 6.037 secs]
INFO   | jvm 1    | 2013/08/16 12:12:36 | [GC [PSYoungGen: 331840K->6044K(352192K)] 611793K->285998K(2449344K), 0.0164060 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
INFO   | jvm 1    | 2013/08/16 12:13:28 | [GC [PSYoungGen: 352156K->6161K(364160K)] 632110K->286114K(2461312K), 0.0152330 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:14:47 | [GC [PSYoungGen: 364113K->6575K(374144K)] 644066K->288169K(2471296K), 0.0179930 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] 

Notice that immediately prior to the OOME, the total committed heap oscillates between 1GB and 2.4GB. We can see that it was pretty steady at 1GB beforehand and pretty steady at 2.4GB afterwards.

Among the javaopts for this 1.6.0._24 JVM are:

  • -Xmx3072m
  • -XX:+HeapDumpOnOutOfMemoryError
  • -XX:-UseGCOverheadLimit
  • -verbose:gc
  • -Xss256k
  • -XX:MaxPermSize=256m
  • -server
  • -XX:+PrintGCDetails

The JVM is running 1.6.0._24. We can't change versions now, but will have a window to do so in the next month or two. If 1.6.0_45 is that much more stable, we will aim to switch to that. We're currently testing it.

The machine only has 4GB of total system ram. In addition, there is a small RAM-disk in use as well. I'm concerned that the Xmx setting is already too high for this environment.

This is puzzling us, as it doesn't seem like the heap usage is very big at the time of the exception. Why are we getting this OOME?

UPDATE: We are attempting to prevent this condition by setting the initial memory (Xms) equal to the maximum memory (Xmx). So far, these experiments have been promising though we've yet to introduce the change in production. It still doesn't explain why the OOME occurred in the first place, though it does seem to indicate that the OOME can be avoided without increasing the max heap size (or reducing the app memory footprint). So the mystery remains of why the heap resizing caused an OOME?

1
How reproducible is it? Can you tell us a bit about the area in the code where it happens?nanofarad
We can't reproduce it in our test environment so far, though we are still trying. It happens every few days in production. From what I can tell, it seems to be based on heap growth in general and not any particular piece of our code.bischoje

1 Answers

1
votes

For reading you logs, it appear you have a very large burst of activity, most like objects large enough to go straight into tenured/old gen. I still suggest you increase you max memory to see how your application behaves as the OOME is likely to give you confusing stats.


This suggests heavy early promotion. The "GC" is a minor collection which it appears that every object is needed, triggering a Full GC which finds some tenured objects which can be dropped. The GC works best when the young object die in the Eden space, but it appears most of your objects are dying in tenured space.

One way to test this is to make the max heap space much larger. If you can try a 24 GB heap or 80% of you main memory see how it behaves then. e.g. try -Xmx24g if you have 32 GB of memory. From these numbers, it appears you want an Eden size of at least 5 GB.

If this is not an option, I suggest you use a memory profiler to reduce the memory consumption by a factor of at least 3x.

I would check you have a recent version of Java 6 like update 45. There were significant performance improvement between update 18 and 26.