1
votes

We have been having below issues from RabbitMQ and had been manually restarting the servers every weekend as a work around.

Network partition detected
Mnesia reports that this RabbitMQ cluster has experienced a network partition. This is a dangerous situation. RabbitMQ clusters should not be installed on networks which can experience partitions.

We have gone through other popular posts on the topic e.g. here and here

Our network is not highly reliable and occasional blips are expected but when it does come up I would have expected 1 of the 4 node RabbitMQ cluster to join the rest of cluster - as is the case with 4 nodes of Tomcat installed on same servers.

  1. Although the nodes on single partition continue to run independently but doesnt seem like that is a graceful recovery from failure in one node.
  2. We didnt have great luck with using any rabbitmqctl commands like rabbitmqctl cluster_status - It used to sporadically cause the rabbitmq process to hang which needed a sudo kill to RabbitMQ process.

We are at a point of evaluating moving to Kafka or any other message broker that handles message partition well

Any thoughts on working around not needing manual RabbitMQ restarts or ability of Kafka to handle such situation is highly appreciated

1
Have you tried with federation (or shovel ) plugin? instead of a cluster I mean. Is it suitable for your application?Gabriele Santomaggio
unfortunately its not suitable!Javaboy

1 Answers

2
votes

I think Kafka with replication should be able to handle network partitions quite easily, as long as the number of brokers partitioned is inferior to the replication factor of your topic (aka, the consumers and producers can always reach at least 1 broker for the topics they're operating with).

To avoid backpressure in the clients while Zookeeper discover the partition and propagate the information to the producers and consumer, you may want to set short ZK heartbeating (yes, you'll need ZK, and a cluster too since you absolutely don't want your whole ZK cluster partitioned).

Fair warning though : using a cluster of kafka brokers will drop the FIFO aspect of your message queue which can be pretty disturbing if you're expecting the same order of messages produced by the producers and read by the consumers, which you could expect with RabbitMQ.