13
votes

We have a java process running on Solaris 10 serving about 200-300 concurrent users. The administrators have reported that memory used by process increases significantly over time. It reaches 2GB in few days and never stops growing.

We have dumped the heap and analysed it using Eclipse Memory Profiler, but weren't able to see anything out of the ordinary there. The heap size was very small.

After adding memory stat logging, to our application we have found discrepancy between memory usage reported by "top" utility, used by the administrator, and the usage reported by MemoryMXBean and Runtime libraries.

Here is an output from both.

Memory usage information 

From the Runtime library
Free memory: 381MB
Allocated memory: 74MB
Max memory: 456MB
Total free memory: 381MB

From the MemoryMXBean library.
Heap Committed: 136MB
Heap Init: 64MB
Heap Used: 74MB
Heap Max: 456MB
Non Heap Committed: 73MB
Non Heap Init: 4MB
Non Heap Used: 72MB

Current idle threads: 4
Current total threads: 13
Current busy threads: 9
Current queue size: 0
Max threads: 200
Min threads: 8
Idle Timeout: 60000

  PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME   CPU COMMAND
99802 axuser   115   59    0 2037M 1471M sleep  503:46 0.14% java

How can this be? top command reports so much more usage. I was expecting that RES should be close to heap+non-heap.

pmap -x , however, reports most of the memory in the heap:

Address     Kbytes       RSS       Anon     Locked Mode   Mapped File
*102000         56         56         56       - rwx----    [ heap ]
*110000       3008       3008       2752       - rwx----    [ heap ]
*400000    1622016    1621056    1167568       - rwx----    [ heap ]
*000000      45056      45056      45056       - rw-----    [ anon ]

Can anyone please shed some light on this? I'm completely lost.

Thanks.

Update

This does not appear to be an issue on Linux.

Also, based on the Peter Lawrey's response the "heap" reported by pmap is native heap not Java heap.

2
Are there any native libraries used in this application? - JimmyJames
The "heap" reported by pmap is likely to be the native heap not the Java heap. What resources could the application be using in native space? - Peter Lawrey
One thing to consider is whether your database transactions are getting cleaned up. You likely have a connection pool and if you keep creating statements without cleaning them up, they might still be hanging around in the native space. - JimmyJames
Despite the name, this is just the "method area" e.g. the PermGen or MetaSpace. docs.oracle.com/javase/8/docs/api/java/lang/management/… it doesn't include the stacks, GUI components, shared libraries, direct memory, nor other native memory. - Peter Lawrey
I suppose that should work assuming the variables haven't been reassigned or set to null before calling that. On a side note, if you are using a connection pool, close will not actually close the connection. It will just return it to the pool. But that shouldn't be an issue if you clean up the statements reliably. - JimmyJames

2 Answers

2
votes

I have encountered a similar problem and found a resolution:

Solaris 11
JDK10
REST application using HTTPS (jetty server)
There was a significant increase of c-heap (observed via pmap) over time

I decided to do some stress tests with libumem. So i started the proces with

UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1

and stressed the application with https requests. After a while I connected to the process with mdb. In mdb I used the command ::findleaks and it showed this as a leak:

libucrypto.so.1`ucrypto_digest_init

So it seems than the JCA (Java Cryptography Architecture) implementation OracleUcrypto has some issues on Solaris.

The problem was resolved by updating of the $JAVA_HOME/conf/security/java.security file - I changed the priority of OracleUcrypto to 3 and the SUN implementation to 1

security.provider.3=OracleUcrypto
security.provider.2=SunPKCS11 ${java.home}/conf/security/sunpkcs11-solaris.cfg
security.provider.1=SUN

After this the problem dissapeared.

This also explains why there is no problem on linux - since there are different implememntations of JCA providers in play

1
votes

In garbage collected environments, holding on to unused pointers amounts to "failure to leak" and prevents the GC from doing its job. It's really easy to accidentally keep pointers around.

A common culprit is hashtables. Another is arrays or vectors which are logically cleared (by setting the reuse index to 0) but where the actual contents of the array (above the use index) is still pointing to something.