0
votes

I have setup ActiveMQ version 5.4.1 (the JVM on the machine is 1.5 and cannot be updated) in a master slave configuration. There are 2 AMQ instances I1 and I2 that run on 2 separate ports (61616 and 61617). They share a common KahaDB. Both instances are started together and whichever is able to get a lock on KahaDB becomes the master. The slave instance fails to get a lock on KahaDB and starts polling every 10 seconds to check if the master has released the lock. This works fine without any issues.

While producing and consuming, the failover protocol is used and the connection string used is
failover:(tcp://I1:61616,tcp://I2:61617)?initialReconnectDelay=10000

The consumer code is as follows

ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory("failover:(tcp://I1:61616,tcp://I2:61617)?initialReconnectDelay=10000"); Connection connection = connectionFactory.createConnection(); connection.start(); connection.setExceptionListener(this); Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE); Destination destination = session.createQueue("TEST.FOO"); MessageConsumer consumer = session.createConsumer(destination); Message message = consumer.receive(10000); while (message != null) { // Process message and read next message = consumer.receive(10000); }

Inside the while loop above, if an instance goes down, and the other one comes up, the AMQ consumer automatically prints an info message shown below.
Transport failed, attempting to automatically reconnect due to: java.io.EOFException java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:268) at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:192) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:184) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:172) at java.lang.Thread.run(Thread.java:619)

However it is unable to reconnect to the up instance and the subsequent messages that are consumed are null.

If the consumer code is re-run, the failover:(tcp://I1:61616,tcp://I2:61617)?initialReconnectDelay=10000 automatically selects the up instance. The only time it doesn't automatically reconnect to the up instance is while consuming, if an instance goes down.

Is there something that I'm missing for the failover protocol to automatically connect to the up instance while consumer is consuming?

1

1 Answers

0
votes

There is a bug in 5.4.1 that impacted initialReconnectDelay.

AMQ-3049: https://issues.apache.org/jira/browse/AMQ-3049

You should upgrade to the latest 5.4.3 at any rate to ensure you have the most fixes for 5.4.x