1
votes

I'm using ActiveMQ 5.9 with Camel 2.10.3, and under load (during a performance test) I'm experiencing some problems, it seems that the broker is stuck trying to close a connection, and I can't understand the reason.

The configuration of the JMS system is the following: there are two broker (configured in failover mode) and many client nodes that acts both as consumer both as producer of some queues (let's take one for example: 'customer_update queue'.

I'm using PooledConnectionFactory with default configuration, 'CACHE_CONSUMER' cache level and 10 max concurrent consumer per each client node.

Broker configuration is the following: tcp://0.0.0.0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600

Here's the thread holding lock on the broker, and never releasing it:

"ActiveMQ Transport: tcp:///10.128.43.206:38694@61616" #5774 daemon prio=5 os_prio=0 tid=0x00007f2a4424e800 nid=0
xaba4 waiting on condition [0x00007f29fe397000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000004fd008fb0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.
java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at org.apache.activemq.broker.TransportConnection.stop(TransportConnection.java:983)
at org.apache.activemq.broker.TransportConnection.processAddConnection(TransportConnection.java:699)
        - locked <0x000000050401eed0> (a java.lang.Object)
at org.apache.activemq.broker.jmx.ManagedTransportConnection.processAddConnection(ManagedTransportConnection.java:79)
at org.apache.activemq.command.ConnectionInfo.visit(ConnectionInfo.java:139)
at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:292)
at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:149)
at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:270)
at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
at java.lang.Thread.run(Thread.java:748)

I have more than 500 other threads on the broker that look like this:

"ActiveMQ Transport: tcp:///10.128.43.206:52074@61616" #2962 daemon prio=5 os_prio=0 tid=0x00007f2a440c9000 nid=0xa01f waiting for monitor entry [0x00007f29fc768000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.activemq.broker.TransportConnection.processAddConnection(TransportConnection.java:696)
        - waiting to lock <0x000000050401eed0> (a java.lang.Object)
        at org.apache.activemq.broker.jmx.ManagedTransportConnection.processAddConnection(ManagedTransportConnection.java:79)
        at org.apache.activemq.command.ConnectionInfo.visit(ConnectionInfo.java:139)
        at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:292)
        at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:149)
        at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
        at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
        at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:270)
        at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
        at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
        at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
        at java.lang.Thread.run(Thread.java:748)

First error I see on the broker is this one:

2018-05-16 16:36:59,336 [org.apache.activemq.broker.TransportConnection.Transport:856] 
WARN  - Transport Connection to: tcp://10.128.43.206:48747 failed: java.io.EOFException

On the client node that is referenced in the broker (10.128.43.206), I see this logs, it seems that the node is trying to reconnect, but right after it's disconnected again, and this happens again and again.

2018-05-16 16:36:59,322 [org.apache.activemq.transport.failover.FailoverTransport:856] WARN  - Transport (tcp://10.128.43.169:61616) failed, reason:  java.io.IOException, attempting to automatically reconnect
2018-05-16 16:36:59,322 [org.apache.activemq.transport.failover.FailoverTransport:856] WARN  - Transport (tcp://10.128.43.169:61616) failed, reason:  java.io.IOException, attempting to automatically reconnect
2018-05-16 16:36:59,375 [org.apache.activemq.transport.failover.FailoverTransport:856] INFO  - Successfully reconnected to tcp://10.128.43.169:61616
2018-05-16 16:36:59,375 [org.apache.activemq.transport.failover.FailoverTransport:856] INFO  - Successfully reconnected to tcp://10.128.43.169:61616
2018-05-16 16:36:59,375 [org.apache.activemq.TransactionContext:856] INFO  - commit failed for transaction TX:ID:52374-1526300283331-1:1:898
javax.jms.TransactionRolledBackException: Transaction completion in doubt due to failover. Forcing rollback of TX:ID:52374-1526300283331-1:1:898
        at org.apache.activemq.state.ConnectionStateTracker.restoreTransactions(ConnectionStateTracker.java:231)
        at org.apache.activemq.state.ConnectionStateTracker.restore(ConnectionStateTracker.java:169)
        at org.apache.activemq.transport.failover.FailoverTransport.restoreTransport(FailoverTransport.java:827)
        at org.apache.activemq.transport.failover.FailoverTransport.doReconnect(FailoverTransport.java:1005)
        at org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:136)
        at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
        at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-05-16 16:37:00,091 [org.apache.activemq.transport.failover.FailoverTransport:856] INFO  - Successfully reconnected to tcp://10.128.43.169:61616
2018-05-16 16:37:00,091 [org.apache.activemq.transport.failover.FailoverTransport:856] INFO  - Successfully reconnected to tcp://10.128.43.169:61616
2018-05-16 16:37:00,112 [org.apache.activemq.transport.failover.FailoverTransport:856] WARN  - Transport (tcp://10.128.43.169:61616) failed, reason:  java.io.IOException, attempting to automatically reconnect

At the end, broker reach maxConnections available (1000), and needs to be restarted.

Could it be due to the fact that one client node is acting both as a consumer and a producer, using same connection pool, generating some kind of deadlock?

Do you have some suggestions?

Thanks

Giulio

1

1 Answers

-1
votes

Most probably I was affected by this issue:

https://issues.apache.org/jira/browse/AMQ-5090

Updating to ActiveMQ 5.10.0 and Camel 2.13.1 solved the issue (system is much more stable, even during performance tests).

Thanks Giulio