Due to scaling reasons we recently switched from a single ActiveMQ broker to a network of brokers. While for the most part everything works exactly as intended there is one weird issue we have encountered after a fresh deployment of the brokers during the day:
First just for the tech stack we are using ActiveMQ 5.12.1 and Camel 2.13.4 for the integration between the java method and a JMS endpoint. The broker side is a network of brokers currently consisting of 3 members using the following configuration
<broker useJmx="${activemq.expose.jmx}" persistent="false"
brokerName="${activemq.brokerName}" xmlns="http://activemq.apache.org/schema/core">
<sslContext>
<amq:sslContext keyStore="${activemq.broker.keyStore}"
keyStorePassword="${activemq.broker.keyStorePassword}"
trustStore="${activemq.broker.trustStore}"
trustStorePassword="${activemq.broker.trustStorePassword}" />
</sslContext>
<systemUsage>
<systemUsage>
<memoryUsage>
<memoryUsage limit="${activemq.memoryUsage}" />
</memoryUsage>
<tempUsage>
<tempUsage limit="${activemq.tempUsage}" />
</tempUsage>
</systemUsage>
</systemUsage>
<destinationPolicy>
<policyMap>
<policyEntries>
<policyEntry queue=">" enableAudit="false">
<networkBridgeFilterFactory>
<conditionalNetworkBridgeFilterFactory
replayWhenNoConsumers="true" />
</networkBridgeFilterFactory>
</policyEntry>
</policyEntries>
</policyMap>
</destinationPolicy>
<networkConnectors>
<networkConnector name="queues"
uri="static:(${activemq.otherBrokers})"
networkTTL="2" dynamicOnly="true"
decreaseNetworkConsumerPriority="true"
conduitSubscriptions="false">
<excludedDestinations>
<topic physicalName=">" />
</excludedDestinations>
</networkConnector>
<networkConnector name="topics"
uri="static:(${activemq.otherBrokers})"
networkTTL="1" dynamicOnly="true"
decreaseNetworkConsumerPriority="true"
conduitSubscriptions="true">
<excludedDestinations>
<queue physicalName=">" />
</excludedDestinations>
</networkConnector>
</networkConnectors>
<transportConnectors>
<transportConnector
uri="${activemq.protocol}${activemq.host}:${activemq.tcp.port}?needClientAuth=true"
updateClusterClients="true" rebalanceClusterClients="true" />
<transportConnector
uri="${activemq.websocket.protocol}${activemq.websocket.host}:${activemq.websocket.port}?needClientAuth=true"
updateClusterClients="true" rebalanceClusterClients="true" />
</transportConnectors>
</broker>
with the following placeholder values
activemq.tcp.port=9000
activemq.protocol=ssl://
activemq.brokerName=activemq-server1.com
activemq.expose.jmx=true
activemq.otherBrokers=ssl://server2.com:9000,ssl://server3.com:9000
activemq.websocket.port=9001
activemq.websocket.protocol=stomp+ssl://
activemq.websocket.host=server1.com
activemq.memoryUsage=1gb
activemq.tempUsage=1gb
On the client side the following camel configuration is being used
<bean id="xxx.activemq.redeliveryPolicy" class="org.apache.activemq.RedeliveryPolicy">
<property name="maximumRedeliveries" value="0" />
</bean>
<bean id="xxx.activemq.jmsConnectionFactory" class="org.apache.activemq.ActiveMQSslConnectionFactory">
<property name="trustStore" value="${activemq.broker.trustStore}" />
<property name="trustStorePassword" value="${activemq.broker.trustStorePassword}" />
<property name="keyStore" value="${activemq.broker.keyStore}" />
<property name="keyStorePassword" value="${activemq.broker.keyStorePassword}" />
<property name="brokerURL" value="${activemq.broker.url}" />
<property name="redeliveryPolicy" ref="xxx.activemq.redeliveryPolicy" />
</bean>
<bean id="xxx.activemq.jmsConfiguration" class="org.apache.activemq.camel.component.ActiveMQConfiguration">
<property name="receiveTimeout" value="6000" />
<property name="connectionFactory" ref="xxx.activemq.pooledConnectionFactory" />
</bean>
<bean id="xxx.activemq.pooledConnectionFactory"
class="org.apache.activemq.pool.PooledConnectionFactory"
init-method="start" destroy-method="stop">
<property name="maxConnections" value="8" />
<property name="idleTimeout" value="0" />
<property name="timeBetweenExpirationCheckMillis"
value="10000" />
<property name="connectionFactory"
ref="xxx.activemq.jmsConnectionFactory" />
</bean>
<bean id="xxx.activemq.jms.abstractComponent" abstract="true"
class="org.apache.activemq.camel.component.ActiveMQComponent">
<property name="configuration"
ref="xxx.activemq.jmsConfiguration" />
<property name="connectionFactory"
ref="xxx.activemq.pooledConnectionFactory" />
<property name="allowNullBody" value="true" />
<property name="transferException" value="true" />
<property name="defaultTaskExecutorType"
value="#{T(org.apache.camel.component.jms.DefaultTaskExecutorType).ThreadPool}" />
<property name="requestTimeout" value="5000" />
</bean>
<bean id="xxx.activemq.jms.queue"
parent="xxx.activemq.jms.abstractComponent">
<property name="concurrentConsumers" value="2" />
<property name="maxConcurrentConsumers" value="2" />
</bean>
with a connection url of
activemq.broker.url=failover:(ssl://server1.com:9000,ssl://server2.com:9000,ssl://server3.com:9000)?randomize=true
The request/reply EIP is achieved by having the producer set an according jmsReplyTo header and having camel default to the InOut using temp-queues.
Before the deployment all messaging was working as intended, however afterwards for some request/reply queues we would start to get timeouts on the producer side. The following entries showed up in the logs:
On producer side
Caused by: org.apache.camel.ExchangeTimedOutException: The OUT message was not received within: 5000 millis due reply message with correlationID: Camel-ID-xxx-intranet-phs-49404-1457684675710-8-11 not received on destination: temp-queue://ID:xxx.intranet.phs-41986-1457684806758-1:3:1.
Exchange[Message: BeanInvocation public abstract xxx.xxx.rapi.dto.RemoteDTO xxx.xxx.xxx.facade.RemoteFacade.findRemoteDTO(java.lang.String,java.lang.Long) with [xxx, 31333]]]
and on consumer side:
Caused by: javax.jms.InvalidDestinationException: Cannot publish to a deleted Destination: temp-queue://ID:xxx.intranet.phs-41986-1457684806758-1:3:1
We have since then done some research and found that the problem would show up whenever an arbitrary broker of the network is shut down and then only for those producers that had a temp-queue open for a reply when the shutdown hit and they failover to a new broker. Afterwards the problem would persist for this producer until he is restarted. Once he joins back after a restart everything is back to normal. The problem is also described on grokbase as well as on two topics here activemq-failover-with-temporary-queues-on-a-network-of-brokers and activemq-how-to-handle-broker-failovers-while-using-temporary-queues. We have tried the one solution given in activemq-how-to-handle-broker-failovers-while-using-temporary-queues to set the cache timeout but did not get any results from it, the other suggested option to turn of advisory listening for clients is not really an option in our setup since we want to make use of features such as clusterRebalancing for easier adding of additional brokers to the network during runtime.
We have also found some JIRA issues on camel and ActiveMQ side like CAMEL-3193 that describe this issue and apparently fix them for versions where ours are newer so we are quite puzzled. Currently we are considering switching to exclusive reply queues over temporary queues to address this issue, but first wanted to ask if maybe we are just missing some configuration somewhere.
If you need any additional information please just ask!