I encountered a strange problem in Production which occurred only once after more than 5 years of operations. We found that the PSPermGen and 'real' duration were far higher than the user+sys timings out of the blue. On 4th Feb 2016, PSPermGen took 38.96 seconds when it was 0.2-0.3 seconds on previous runs and there was no memory freed. Real time took 40 seconds when user+sys were 0.3-0.4 seconds, which is abnormally high compared to the previous runs. Full GC were not seen to be occurring frequently, and neither have I observed any GC related errors. CPU usage was < 1%, and memory usage was < 20% through the duration.
INFO | jvm 1 | 2016/02/04 02:56:56 | [GC [PSYoungGen: 88312K->936K(86144K)] 261016K->174219K(260928K), 0.4070215 secs] [Times: user=0.75 sys=0.00, real=0.42 secs] INFO | jvm 1 | 2016/02/04 02:57:08 | [GC INFO | jvm 1 | 2016/02/04 02:57:11 | [PSYoungGen: 85703K->96K(102400K)] 258985K->175019K(278784K), 3.1902546 secs] [Times: user=7.14 sys=0.05, real=3.18 secs] INFO | jvm 1 | 2016/02/04 02:57:13 | [GC [PSYoungGen: 102368K->755K(123520K)] 277291K->175700K(299904K), 0.0266474 secs] [Times: user=0.00 sys=0.00, real=0.02 secs] INFO | jvm 1 | 2016/02/04 02:57:20 | [GC [PSYoungGen: 123507K->128K(144768K)] 298452K->175419K(321152K), 0.0021357 secs] [Times: user=0.01 sys=0.01, real=0.02 secs] INFO | jvm 1 | 2016/02/04 02:57:28 | [GC [PSYoungGen: 143616K->977K(144704K)] 318907K->176319K(321088K), 0.0035773 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] INFO | jvm 1 | 2016/02/04 02:57:28 | [Full GC STATUS | wrapper | 2016/02/04 02:58:03 | JVM appears hung: Timed out waiting for signal from JVM. Restarting JVM. INFO | jvm 1 | 2016/02/04 02:58:08 | [PSYoungGen: 977K->0K(144704K)] [PSOldGen: 175341K->35127K(174784K)] 176319K->35127K(319488K) [PSPermGen: 38992K->38992K(65536K)], 38.9595115 secs] [Times: user=0.33 sys=0.14, real=40.07 secs]
The last few full GCs performed were:
INFO | jvm 1 | 2016/02/02 12:59:49 | [Full GC [PSYoungGen: 433K->0K(98944K)] [PSOldGen: 176749K->38591K(174784K)] 177183K->38591K(273728K) [PSPermGen: 39380K->39380K(65536K)], 0.2982459 secs] [Tim es: user=0.30 sys=0.00, real=0.30 secs] 21189 INFO | jvm 1 | 2016/02/03 03:15:16 | [Full GC [PSYoungGen: 2873K->0K(125696K)] [PSOldGen: 171318K->33804K(174784K)] 174191K->33804K(300480K) [PSPermGen: 39940K->38366K(65536K)], 0.3317803 secs] [T imes: user=0.34 sys=0.00, real=0.34 secs] 47151 INFO | jvm 1 | 2016/02/03 12:16:51 | [Full GC [PSYoungGen: 1406K->0K(171776K)] [PSOldGen: 174936K->38003K(174784K)] 176342K->38003K(346560K) [PSPermGen: 38642K->38642K(65536K)], 0.2988259 secs] [T imes: user=0.30 sys=0.00, real=0.30 secs]
I'm running Java 6 update 45 on Windows 2003 x86 with around 8GB and 4 CPUs. Disk space was more than enough. We configure the JVM with initial heap memory of 256 MB and max memory of 512 MB. The application runs as a Windows Service via Tanuki Wrapper.
It currently doesn't seem like the long GC pause was due to memory, CPU, disk space, etc. What am I missing here?
PerfMon
to monitor swapping. Though there is no single counter for per-process page file usage, swapping activity can be detected by the combination ofMemory: Pages/sec
,Process: Page Faults
,Process: I/O bytes
counters. – apangin