4
votes

I'm trying to provide disaster recovery between two data centers for a RabbitMQ. The secondary datacenter is passive until the primary DC goes down.
Federation of queues is inappropriate because it wouldn't move messages until the consumers in the secondary DC go active. That shouldn’t happen unless the primary DC is unavailable at which point those messages are inaccessible. I’ve considered creating an extra queue in the primary DC that would receive a copy of each message and then use Federation or Shovel to copy those messages to the secondary. The issue then becomes removing the duplicate message from the secondary DC when the “original” in the primary DC is processed.
Mirroring the queue to a node in the secondary DC would be work, except that RabbitMQ won’t cluster over a WAN due to latency. Has anyone else faced this scenario? Thanks.

1

1 Answers

0
votes

you quite eloquently explain the issues with using Federation and Shovels to try and solve DR with RabbitMq. Rabbit isn't really designed to move data efficiently over a WAN.

Moving data across a WAN always presents problems for a messaging solutions. For instance IBM MQ has multi-instance queue managers for HA, but needs a SAN for DR which becomes expensive both in product and processing time.

Another free product like Rabbit that you could use is Solace. It has HA and DR replication built into it. It can manage active/passive passive DR scenario you describe by moving each message across the WAN asynchronously in near realtime. As soon as you're ready to move application traffic to the backup DC, you can activate the backup instance and start consuming messages. It automatically "removes the duplicate message" as it is consumed from the active side.