Minor GC pause times are increasing. What are the possible reasons?

Question

We are experiencing increases in the minor GC pause times. The application is a server written in Java, executing 500 transactions per seconds. After a CMS collection the pause times decreases but after hours of running pause times increases again. I've checked swap usage with vmstat, there isn't any swap input/output.

What are the possible reasons for increase in minor gc pause times?

Here is a part of the gc log file while the minor gc pause times are short: (0.01 seconds)

{Heap before GC invocations=123 (full 0):
 par new generation   total 306688K, used 275101K [0xfffffffe80010000, 0xfffffffe94cd0000, 0xfffffffe94cd0000)
  eden space 272640K, 100% used [0xfffffffe80010000, 0xfffffffe90a50000, 0xfffffffe90a50000)
  from space 34048K,   7% used [0xfffffffe92b90000, 0xfffffffe92df7718, 0xfffffffe94cd0000)
  to   space 34048K,   0% used [0xfffffffe90a50000, 0xfffffffe90a50000, 0xfffffffe92b90000)
 concurrent mark-sweep generation total 3243264K, used 71097K [0xfffffffe94cd0000, 0xffffffff5ac10000, 0xffffffff5ac10000)
 concurrent-mark-sweep perm gen total 262144K, used 30336K [0xffffffff5ac10000, 0xffffffff6ac10000, 0xffffffff6ac10000)
2012-07-29T20:00:19.790+0300: 472.781: [GC 472.782: [ParNew
Desired survivor size 17432576 bytes, new threshold 4 (max 4)
- age   1:    1068432 bytes,    1068432 total
- age   2:     296272 bytes,    1364704 total
- age   3:      68624 bytes,    1433328 total
- age   4:      53776 bytes,    1487104 total
: 275101K->2187K(306688K), 0.0111305 secs] 346199K->73301K(3549952K), 0.0127033 secs] [Times: user=0.13 sys=0.01, real=0.01 secs]
Heap after GC invocations=124 (full 0):
 par new generation   total 306688K, used 2187K [0xfffffffe80010000, 0xfffffffe94cd0000, 0xfffffffe94cd0000)
  eden space 272640K,   0% used [0xfffffffe80010000, 0xfffffffe80010000, 0xfffffffe90a50000)
  from space 34048K,   6% used [0xfffffffe90a50000, 0xfffffffe90c72dc0, 0xfffffffe92b90000)
  to   space 34048K,   0% used [0xfffffffe92b90000, 0xfffffffe92b90000, 0xfffffffe94cd0000)
 concurrent mark-sweep generation total 3243264K, used 71114K [0xfffffffe94cd0000, 0xffffffff5ac10000, 0xffffffff5ac10000)
 concurrent-mark-sweep perm gen total 262144K, used 30336K [0xffffffff5ac10000, 0xffffffff6ac10000, 0xffffffff6ac10000)
}

Below is a part of the gc log file while the minor gc pause times are long : (0.9 seconds)

{Heap before GC invocations=38526 (full 3):
 par new generation   total 306688K, used 280310K [0xfffffffe80010000, 0xfffffffe94cd0000, 0xfffffffe94cd0000)
  eden space 272640K, 100% used [0xfffffffe80010000, 0xfffffffe90a50000, 0xfffffffe90a50000)
  from space 34048K,  22% used [0xfffffffe90a50000, 0xfffffffe911cda60, 0xfffffffe92b90000)
  to   space 34048K,   0% used [0xfffffffe92b90000, 0xfffffffe92b90000, 0xfffffffe94cd0000)
 concurrent mark-sweep generation total 3243264K, used 1023755K [0xfffffffe94cd0000, 0xffffffff5ac10000, 0xffffffff5ac10000)
 concurrent-mark-sweep perm gen total 262144K, used 34997K [0xffffffff5ac10000, 0xffffffff6ac10000, 0xffffffff6ac10000)
2012-07-31T13:21:46.709+0300: 149360.928: [GC 149360.930: [ParNew
Desired survivor size 17432576 bytes, new threshold 4 (max 4)
- age   1:    1794816 bytes,    1794816 total
- age   2:    2864408 bytes,    4659224 total
- age   3:      72672 bytes,    4731896 total
- age   4:      86160 bytes,    4818056 total
: 280310K->7605K(306688K), 0.9073290 secs] 1304066K->1031418K(3549952K), 0.9100161 secs] [Times: user=9.34 sys=0.15, real=0.91 secs]
Heap after GC invocations=38527 (full 3):
 par new generation   total 306688K, used 7605K [0xfffffffe80010000, 0xfffffffe94cd0000, 0xfffffffe94cd0000)
  eden space 272640K,   0% used [0xfffffffe80010000, 0xfffffffe80010000, 0xfffffffe90a50000)
  from space 34048K,  22% used [0xfffffffe92b90000, 0xfffffffe932fd490, 0xfffffffe94cd0000)
  to   space 34048K,   0% used [0xfffffffe90a50000, 0xfffffffe90a50000, 0xfffffffe92b90000)
 concurrent mark-sweep generation total 3243264K, used 1023813K [0xfffffffe94cd0000, 0xffffffff5ac10000, 0xffffffff5ac10000)
 concurrent-mark-sweep perm gen total 262144K, used 34997K [0xffffffff5ac10000, 0xffffffff6ac10000, 0xffffffff6ac10000)
}

JVM args:

-d64  -server  \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-Xloggc:./logs/gc_"$DATE".log -XX:+PrintGCDetails -XX:+PrintGCDateStamps \
-XX:+PrintGC \
-XX:PrintCMSStatistics=2 \
-XX:+PrintTenuringDistribution \
-XX:+PrintHeapAtGC \
-XX:+PrintGCTaskTimeStamps \
-XX:PermSize=256m -XX:MaxPermSize=256m \
-XX:+CMSClassUnloadingEnabled \
-XX:-OmitStackTraceInFastThrow \
-XX:ParallelGCThreads=16 \
-XX:ParallelCMSThreads=12 \
-Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFFE -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFFE \
-Xms3500m -Xmx3500m -Xss192k

Regarding checking swap with vmstat - paging should never be an issue with a JVM, or we'd be talking about delays of a different magnitude altogether. The gc times above differ only by a rough factor of 10. — Henk Langeveld
What JVM version are you using? Older version of JVM have bug with -XX:ParallelCMSThreads > 1. This bug prevents references (weak, soft, etc) to be correctly processed without full GC. Your logs indeed seems strange. I would recommend add -XX:+PrintReferenceGC to verify reference processing time. If you application actively using weak or soft references it could be a cause. — Alexey Ragozin
@ilkinulas 6u25 is prety old I would recommend you to remove -XX:ParallelCMSThreads and add -XX:-CMSConcurrentMTEnabled. — Alexey Ragozin

Peter Lawrey Peter Lawrey · Accepted Answer · 2012-08-03T08:10:28

Having a large eden space is surprisingly cheap, what is expensive is copying the objects which are retained.

In your shorter times you retained (the second size)

346199K->73301K, 0.0127033 secs

In your longer times you retained much more

1304066K->1031418K(3549952K), 0.9100161 secs

i.e. in the first case the GC took longer because you kept much more data.

The first thing I try in these cases is to increase the eden size, hoping that more objects will die before a collection occurs.

Minor GC pause times are increasing. What are the possible reasons?

2 Answers