0
votes

When using the G1 GC in a server with XMX of 8 GB, Full GC failure happens after running some days.

Try to adjust the JVM GC arguments several around, and print out all the GC details but still can not identify the root cause

JVM arguments:

java -Xms8g -Xmx8g 
-XX:+CrashOnOutOfMemoryError 
-XX:+AlwaysPreTouch 
-XX:-UseBiasedLocking 
-XX:MaxTenuringThreshold=15 
-Xss256k 
-XX:SurvivorRatio=6 
-XX:+UseTLAB 
-XX:GCTimeRatio=4 
-XX:+ScavengeBeforeFullGC 
-XX:G1HeapRegionSize=8M 
-XX:ConcGCThreads=8 
-XX:G1HeapWastePercent=10 
-XX:+AggressiveOpts 
-XX:MaxMetaspaceSize=256m 
-XX:+UseG1GC 
-XX:InitiatingHeapOccupancyPercent=35 
-XX:+DisableExplicitGC 
-Xloggc:/var/tmp/prod/query/Portfolio/PORTFOLIO-QRY-A-Instance1/query-gc.log
-verbose:gc 
-XX:+PrintGCDetails 
-XX:+PrintGCDateStamps 
-XX:+PrintGCTimeStamps 
-XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 
-XX:GCLogFileSize=100M 
-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/var/tmp/prod/query/xxx.log-XX:NewSize=3g 
-XX:MaxNewSize=5g 
-server

The last GC details:

2019-01-25T00:25:28.998+0800: 399236.910: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.3400461 secs]
[Eden: 1080.0M(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->7936.0M(8192.0M)] [Times: user=2.22 sys=0.01, real=0.34 secs] 2019-01-25T00:25:29.338+0800: 399237.251: [GC concurrent-root-region-scan-start] 2019-01-25T00:25:29.338+0800: 399237.251: [GC concurrent-root-region-scan-end, 0.0000869 secs] 2019-01-25T00:25:29.338+0800: 399237.251: [GC concurrent-mark-start] 2019-01-25T00:25:29.419+0800: 399237.332: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.2033834 secs] [Eden: 208.0M(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->8144.0M(8192.0M)] [Times: user=0.67 sys=0.02, real=0.21 secs] 2019-01-25T00:25:29.624+0800: 399237.537: [GC pause (G1 Evacuation Pause) (young), 0.0076649 secs] [Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->8144.0M(8192.0M)] [Times: user=0.07 sys=0.00, real=0.01 secs] 2019-01-25T00:25:29.632+0800: 399237.545: [GC pause (G1 Evacuation Pause) (young), 0.0072213 secs] [Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->8144.0M(8192.0M)] [Times: user=0.06 sys=0.00, real=0.01 secs] 2019-01-25T00:25:29.640+0800: 399237.553: [GC pause (G1 Evacuation Pause) (young), 0.0032099 secs] [Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->8144.0M(8192.0M)] [Times: user=0.02 sys=0.01, real=0.00 secs] 2019-01-25T00:25:29.645+0800: 399237.557: [GC pause (G1 Evacuation Pause) (young), 0.0041076 secs] [Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->8144.0M(8192.0M)] [Times: user=0.03 sys=0.00, real=0.00 secs] 2019-01-25T00:25:29.649+0800: 399237.562: [GC pause (G1 Evacuation Pause) (young), 0.0027963 secs] [Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->8144.0M(8192.0M)] [Times: user=0.01 sys=0.00, real=0.01 secs] 2019-01-25T00:25:29.653+0800: 399237.566: [GC pause (G1 Evacuation Pause) (young), 0.0027614 secs] [Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->8144.0M(8192.0M)] [Times: user=0.02 sys=0.00, real=0.00 secs] 2019-01-25T00:25:29.656+0800: 399237.569: [Full GC (Allocation Failure) 8144M->4016M(8192M), 10.6192450 secs] [Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->4016.1M(8192.0M)], [Metaspace: 83995K->83979K(1126400K)] [Times: user=15.73 sys=0.00, real=10.62 secs] 2019-01-25T00:25:40.277+0800: 399248.190: [GC concurrent-mark-abort] Heap garbage-first heap total 8388608K, used 4268154K [0x00000005c0000000, 0x00000005c0802000, 0x00000007c0000000) region size 8192K, 20 young (163840K), 0 survivors (0K) Metaspace
used 84034K, capacity 85146K, committed 86732K, reserved 1126400K
class space used 8833K, capacity 9090K, committed 9420K, reserved 1048576K


Our server is 32G 16 core, any advice or suggestion will be very appreciated!

1
"Our server is 32G", -Xmx8g - seems like the answer is obvious, no?apangin

1 Answers

2
votes

It's complicated.

In the logs, I found that the doing full GC due to the old generation is too small.

"Allocation Failure" showed that direct allocation to old generation failed.

But, from the logs I also found that the minor GCs before full GC is too frequent and free a little bit space. There was 7 minor GC between 25:29.338 25:29.653。 Only first minor GC freed some space.

The worst problem is “[Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B”. It seems that all objects are allocated in old generation.From the log "[Eden: 0.0B(3072.0M)->0.0B(3072.0M) Survivors: 0.0B->0.0B Heap: 8144.0M(8192.0M)->4016.1M(8192.0M)]", I think that this full GC freed 4000M+ old generation space. It's very weird. Your application needs at least 4G old generation and almost all objects in old generation are reclaimed. It means that the old objects is not old enough. Most of them are promoted prematurely. Or much of them are huge objects.

I try to give you some advices...

  1. -XX:InitiatingHeapOccupancyPercent=35 is too small. It makes minor GC more frequent and the objects may be promoted prematurely. So the old generation will be full soon. You can set XX:InitiatingHeapOccupancyPercent bigger. Or you can think about adaptive IHOP.
  2. The old generation is too small. Some application need much more old generation space then young generation space. Somebody say that double or triple young generation space size is good.

  3. MaxTenuringThreshold=15 can set to the value that bigger than 20.