0
votes

I encountered a strange problem in Production which occurred only once after more than 5 years of operations. We found that the PSPermGen and 'real' duration were far higher than the user+sys timings out of the blue. On 4th Feb 2016, PSPermGen took 38.96 seconds when it was 0.2-0.3 seconds on previous runs and there was no memory freed. Real time took 40 seconds when user+sys were 0.3-0.4 seconds, which is abnormally high compared to the previous runs. Full GC were not seen to be occurring frequently, and neither have I observed any GC related errors. CPU usage was < 1%, and memory usage was < 20% through the duration.

INFO | jvm 1 | 2016/02/04 02:56:56 | [GC [PSYoungGen: 88312K->936K(86144K)] 261016K->174219K(260928K), 0.4070215 secs] [Times: user=0.75 sys=0.00, real=0.42 secs] INFO | jvm 1 | 2016/02/04 02:57:08 | [GC INFO | jvm 1 | 2016/02/04 02:57:11 | [PSYoungGen: 85703K->96K(102400K)] 258985K->175019K(278784K), 3.1902546 secs] [Times: user=7.14 sys=0.05, real=3.18 secs] INFO | jvm 1 | 2016/02/04 02:57:13 | [GC [PSYoungGen: 102368K->755K(123520K)] 277291K->175700K(299904K), 0.0266474 secs] [Times: user=0.00 sys=0.00, real=0.02 secs] INFO | jvm 1 | 2016/02/04 02:57:20 | [GC [PSYoungGen: 123507K->128K(144768K)] 298452K->175419K(321152K), 0.0021357 secs] [Times: user=0.01 sys=0.01, real=0.02 secs] INFO | jvm 1 | 2016/02/04 02:57:28 | [GC [PSYoungGen: 143616K->977K(144704K)] 318907K->176319K(321088K), 0.0035773 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] INFO | jvm 1 | 2016/02/04 02:57:28 | [Full GC STATUS | wrapper | 2016/02/04 02:58:03 | JVM appears hung: Timed out waiting for signal from JVM. Restarting JVM. INFO | jvm 1 | 2016/02/04 02:58:08 | [PSYoungGen: 977K->0K(144704K)] [PSOldGen: 175341K->35127K(174784K)] 176319K->35127K(319488K) [PSPermGen: 38992K->38992K(65536K)], 38.9595115 secs] [Times: user=0.33 sys=0.14, real=40.07 secs]

The last few full GCs performed were:

INFO | jvm 1 | 2016/02/02 12:59:49 | [Full GC [PSYoungGen: 433K->0K(98944K)] [PSOldGen: 176749K->38591K(174784K)] 177183K->38591K(273728K) [PSPermGen: 39380K->39380K(65536K)], 0.2982459 secs] [Tim es: user=0.30 sys=0.00, real=0.30 secs] 21189 INFO | jvm 1 | 2016/02/03 03:15:16 | [Full GC [PSYoungGen: 2873K->0K(125696K)] [PSOldGen: 171318K->33804K(174784K)] 174191K->33804K(300480K) [PSPermGen: 39940K->38366K(65536K)], 0.3317803 secs] [T imes: user=0.34 sys=0.00, real=0.34 secs] 47151 INFO | jvm 1 | 2016/02/03 12:16:51 | [Full GC [PSYoungGen: 1406K->0K(171776K)] [PSOldGen: 174936K->38003K(174784K)] 176342K->38003K(346560K) [PSPermGen: 38642K->38642K(65536K)], 0.2988259 secs] [T imes: user=0.30 sys=0.00, real=0.30 secs]

I'm running Java 6 update 45 on Windows 2003 x86 with around 8GB and 4 CPUs. Disk space was more than enough. We configure the JVM with initial heap memory of 256 MB and max memory of 512 MB. The application runs as a Windows Service via Tanuki Wrapper.

It currently doesn't seem like the long GC pause was due to memory, CPU, disk space, etc. What am I missing here?

1
This can happen if JVM process starts swapping out.apangin
@apangin thanks for the suggestion. How do I tell if the JVM process is indeed swapping out? It doesn't appear like there is an issue with memory on the JVM or OS. Is there a way I can simulate this?louis xie
You can use PerfMon to monitor swapping. Though there is no single counter for per-process page file usage, swapping activity can be detected by the combination of Memory: Pages/sec, Process: Page Faults, Process: I/O bytes counters.apangin
@apangin what could I do to remediate the JVM swapping issue? My application has a low heap size, memory consumption was normal, and cpu usage was low. I have not encountered this issue the last 5 years and it hasn't recurred since the last 1 month it happened. There were no allocation failures during minor or full GC as shown in the logs above.louis xie
If the system has plenty of memory, you may try to disable page file altogether. Or you may force full GC, say, every hour, to touch heap pages to decrease chances they are swapped out. Or you may play with SetProcessWorkingSetSize / VirtualLock from native code, though this will be tricky.apangin

1 Answers

0
votes

I just found an article at http://www.evanjones.ca/jvm-mmap-pause.html which suggest the problem could lie with disk IO, and it could be resolved by setting -XX:+PerfDataDisableSharedMem. However, another link here: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2015-March/014324.html suggest this would affect monitoring tools such as jcmd and jps