Tracing memory leak in Spring Azure qPID JMS code

Question

Im trying to trace and identify root cause for memory leak in our very small and simple Spring Boot application.

It uses following: - Spring Boot 2.2.4 - azure-servicebus-jms-spring-boot-starter 2.2.1 - MSSQL

Function: The app only dispatches Azure ServiceBus queue and stores data and sends data to other destination. It is a small app so it starts easily with 64 megs of memory, despite I give it up to 256 megs via Xmx option. Important note is the queue is being dispatched using Spring default transacted mode with dedicated JmsTransactionManager who is actually inner TM of ChainedTransactionManager along with dbTM and additional outbound JMS TM. Both JMS ConnectionFactory objects are created as CachingConnectionFactory.

Behavior:

Once the app is started it seems OK. There is no traffic so I can see in the log it is opening transactions and closing when checking the queue (jms:message-driven-channel-adapter).

However after some time when there is still no traffic, no single message was consumed the memory starts climbing as monitored via JVVM.

There is an error thrown:

--2020-04-24 11:17:01.443 - WARN 39892 ---   [er.container-10] o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener invoker failed for destination 'MY QUEUE NAME HERE' - trying to recover. Cause: Heuristic completion: outcome state is rolled back; nested exception is org.springframework.transaction.TransactionSystemException: Could not commit JMS transaction; nested exception is javax.jms.IllegalStateException: The Session was closed due to an unrecoverable error.

... and after several minutes it reaches MAX of the heap and since that moment it is failing on OutOfMemory error in the thread opening JMS connections.

--2020-04-24 11:20:04.564 - WARN 39892 ---   [windows.net:-1]] i.n.u.concurrent.AbstractEventExecutor   : A task raised an exception. Task: org.apache.qpid.jms.provider.amqp.AmqpProvider$$Lambda$871/0x000000080199f840@1ed8f2b9
-
java.lang.OutOfMemoryError: Java heap space
        at java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61)
        at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:348)
        at org.apache.qpid.proton.engine.impl.ByteBufferUtils.newWriteableBuffer(ByteBufferUtils.java:99)
        at org.apache.qpid.proton.engine.impl.TransportOutputAdaptor.init_buffers(TransportOutputAdaptor.java:108)
        at org.apache.qpid.proton.engine.impl.TransportOutputAdaptor.pending(TransportOutputAdaptor.java:56)
        at org.apache.qpid.proton.engine.impl.SaslImpl$SwitchingSaslTransportWrapper.pending(SaslImpl.java:842)
        at org.apache.qpid.proton.engine.impl.HandshakeSniffingTransportWrapper.pending(HandshakeSniffingTransportWrapper.java:138)
        at org.apache.qpid.proton.engine.impl.TransportImpl.pending(TransportImpl.java:1577)
        at org.apache.qpid.proton.engine.impl.TransportImpl.getOutputBuffer(TransportImpl.java:1526)
        at org.apache.qpid.jms.provider.amqp.AmqpProvider.pumpToProtonTransport(AmqpProvider.java:994)
        at org.apache.qpid.jms.provider.amqp.AmqpProvider.pumpToProtonTransport(AmqpProvider.java:985)
        at org.apache.qpid.jms.provider.amqp.AmqpProvider.lambda$close$3(AmqpProvider.java:351)
        at org.apache.qpid.jms.provider.amqp.AmqpProvider$$Lambda$871/0x000000080199f840.run(Unknown Source)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
        at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.base/java.lang.Thread.run(Thread.java:835)

HeapDumps:

I took couple of heap snapshots during this whole process and looked at what gets increased. I can see suspicious amount of ConcurrentHashMap/String/Byte[] objects.

Has anyone some clue/hint what can be wrong in this setup and libs: Spring Boot, Apache qPid used under the hood of the Azure JMS dependency etc.? Many thanks.

Update #1 I have clear evidence that the problem is either in Spring or azure service bus starter library - not automatically qPid client used. I would say the library has the bug rather than Spring, just my guess. This is how the failing setup looks like:

There are two JMS destinations and one DB, each having its transaction manager
There is ChainedTransactionManager wrapping above three TMs.
Spring integration app which connects to Azure ServiceBus queue via jms:message-driven-channel-adapter and setting the transaction manager on this component (as created in point 2)
Start the app., no traffic on the queue is needed, after 10 minutes the app will crash due to OutOfMemoryError ... within those 10 minutes I watch log on debug level and only thing which is happening is opening and closing transactions using ChainedTransactionManager ... also as written in the comments another important condition is the third JMS TransactionManager ... with 2 TMs it works and is stable, with 3 it will crash ...

Update: When I remove transactions and set auto ack. the problem is resolved. This leads to hypothesis that the transaction manager and its logic in all three local TMs is causing this issue. Possibly it could still be in qPid code which is executed when transactions are used or last, some mistake in my code - i.e. creating and setting TM. (???) — JavaDude
There is no evidence here that it is related to Qpid JMS so until you can find out who's map objects are growing it will be hard to say more. There is no known leaks in the client at this time. — Tim Bish
As I wrote in above comment I found out it is related to transactions. The ChainedTransactionManager when set on the inboundChannelAdapter causes this problem. Further more I realized the error behavior does not occur in other very similar app which has Jms transaction mgr. and dataSource TM. However the app which suffers with the problem has Chained TM having three resources: Jms1, DB, Jms2. Both JMS TMs are connected to its own CachingConnectionFactory created which is wrapper of org.apache.qpid.jms.JmsConnectionFactory. The objects which grow unexpectedly and occupy memory are those byte[] — JavaDude

JavaDude JavaDude · Accepted Answer · 2020-04-26T18:30:36

Additional research and steps taken identified the most likely root cause Spring CachingConnectionFactory class. Once I removed that and used only native types the problem went away and memory consumption profile is very different and healthy.

I have to say I created CachingConnectionFactory using standard constructor and didnt further configure the behavior. However these Spring defaults clearly lead to memory leak as per my experience.

In past I had memory leak with ActiveMq which had to be resolved by using CachingConnectionFactory and now I have memory leak with Azure ServiceBus when using CachingConnectionFactory .. strange :) In both cases I see that as bugs because memory management should be correct regardless caching involved or not.

Marking this as my answer.

Tested case: The problem occurs when receiving and sending message both with its own TM and both JMS connectionFactories are type CachedConnectionFactory. At the end I tested the app. with inbound connection factory of type CachedConnectionFactory and outbound just native type ... no memory leak as well.

Tracing memory leak in Spring Azure qPID JMS code

1 Answers