When G1 decides it needs to start doing mixed collections, it aggressively shrinks our Eden space from 10g to about 1g.
{Heap before GC invocations=294 (full 0):
garbage-first heap total 20480000K, used 18304860K [0x00000002de000000, 0x00000002de804e20, 0x00000007c0000000)
region size 8192K, 1363 young (11165696K), 11 survivors (90112K)
Metaspace used 37327K, capacity 37826K, committed 38096K, reserved 1083392K
class space used 3935K, capacity 4081K, committed 4096K, reserved 1048576K
2016-03-31T20:57:31.002+0000: 7196.427: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 717225984 bytes, new threshold 1 (max 1)
- age 1: 41346816 bytes, 41346816 total
7196.427: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 144693, predicted base time: 48.88 ms, remaining time: 951.12 ms, target pause time: 1000.00 ms]
7196.427: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1352 regions, survivors: 11 regions, predicted young region time: 20.72 ms]
7196.427: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1352 regions, survivors: 11 regions, old: 0 regions, predicted pause time: 69.60 ms, target pause time: 1000.00 ms]
7196.494: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 789 regions, reclaimable: 4703761904 bytes (22.43 %), threshold: 5.00 %]
, 0.0673540 secs]
[Parallel Time: 60.1 ms, GC Workers: 18]
[GC Worker Start (ms): Min: 7196427.8, Avg: 7196428.1, Max: 7196428.2, Diff: 0.4]
[Ext Root Scanning (ms): Min: 7.3, Avg: 7.5, Max: 7.7, Diff: 0.4, Sum: 134.2]
[Update RS (ms): Min: 28.2, Avg: 28.8, Max: 29.9, Diff: 1.7, Sum: 518.4]
[Processed Buffers: Min: 41, Avg: 57.7, Max: 80, Diff: 39, Sum: 1039]
[Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: 3.7]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 22.1, Avg: 23.1, Max: 23.4, Diff: 1.3, Sum: 416.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 18]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 2.5]
[GC Worker Total (ms): Min: 59.5, Avg: 59.7, Max: 60.0, Diff: 0.5, Sum: 1075.1]
[GC Worker End (ms): Min: 7196487.7, Avg: 7196487.8, Max: 7196487.9, Diff: 0.2]
[Code Root Fixup: 0.2 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.9 ms]
[Other: 5.2 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.5 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.5 ms]
[Humongous Register: 0.2 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 2.3 ms]
[Eden: 10.6G(10.6G)->0.0B(848.0M) Survivors: 88.0M->152.0M Heap: 17.5G(19.5G)->7128.3M(19.5G)]
Heap after GC invocations=295 (full 0):
garbage-first heap total 20480000K, used 7299344K [0x00000002de000000, 0x00000002de804e20, 0x00000007c0000000)
region size 8192K, 19 young (155648K), 19 survivors (155648K)
Metaspace used 37327K, capacity 37826K, committed 38096K, reserved 1083392K
class space used 3935K, capacity 4081K, committed 4096K, reserved 1048576K
}
[Times: user=1.09 sys=0.00, real=0.07 secs]
2016-03-31T20:57:31.070+0000: 7196.495: Total time for which application threads were stopped: 0.0699324 seconds, Stopping threads took: 0.0003462 seconds
This is after it's been running with 10-11g of Eden for 60 or more collections.
Here are the appropriate JVM GC parameters we're running with
-Xms20000m -Xmx20000m
-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=1000
-XX:GCTimeRatio=99
-XX:InitiatingHeapOccupancyPercent=35
-XX:MaxTenuringThreshold=1
-XX:G1ConcRefinementThreads=6
-XX:ConcGCThreads=18
-XX:ParallelGCThreads=18
-XX:+PrintGCDetails"
-XX:+PrintGCDateStamps"
-XX:+PrintHeapAtGC"
-XX:+PrintTenuringDistribution"
-XX:+PrintGCApplicationStoppedTime"
-XX:+PrintPromotionFailure"
-XX:+PrintAdaptiveSizePolicy"
According to page 55 of this presentation, it needs to resize Eden so that max pause target accounts for the entire heap, not just to the new generation. Why is the collector being so aggressive?
Average young generation pause times are between 50-150ms for a heap size of 10g. If the presentation is correct (I haven't found anything else to support the statement), I would expect shrinking by half (20g heap), not 10x.
PrintAdaptiveSizePolicy
parameter is enabled (added to question). In the presentation, issue #3, it states "Must account for old regions, not only young, Currently G1 shrinks the young generation". Ideally, I wouldn't expect the generation to shrink at all. We've set our pause goal to 1000ms and collections are much quicker than that. Having to collect twice the garbage (which is already marked) shouldn't take more than twice the time. – Savior