4
votes

I'm trying to get RabbitMQ configured behind an F5 load balancer. I have a working RabbitMQ node with the default node name of rabbit@%computername%. It's set to listen on all network interfaces (all IP addresses 0.0.0.0:5671 which is the AMQP SSL port), and it's working fine. However, all client applications that connect to it are currently using the specific node name e.g. "%computername%". In order to take advantage of the fault tolerance of the load balancer, I want to update all my client applications to use the load-balanced name instead of the specific node name e.g. connect using HostName = "balancedname.mycompany.com" instead of "%computername%". However, when I update my client applications to connect to the load-balanced name, the connection fails. How can I get this to work?

I'm a novice at F5, and I did notice that the pool's members' addresses are IP addresses...should these be the node names instead of the IPs? Is that even possible seeing as the node name can be completely arbitrary and doesn't necessarily map to anything that's network-resolveable? I'm in a hosting situation where I don't have write access to the F5, so trying these things out is a bit tricky.

I haven't found very much information at all on load balancing a RabbitMQ setup. I do understand that all RabbitMQ queues only really exist on one node, and I've set up the F5 in an active-passive mode so that traffic will always route to the primary node unless it goes down.

Update 1: It seems that this issue came back to bite me here. I'm using EXTERNAL authentication using an SSL certificate, and since clients were connecting using the load balance name instead of the node name, and the load balance name was NOT used to create the certificate, it was rejecting the connection. I ended up re-generating the certificate and using the load balance name, but that wasn't enough - I also had to add an entry in the Windows hosts file to map 127.0.0.1 and ::1 to the load balance DNS address.

Update 2: Update 1 solves connection problems only for running client applications on the app server that is part of the load balancer, but remote clients don't work. Inner exception says "The certificate chain was issued by an authority that is not trusted". RabbitMQ + SSL is hard. And adding load balancing makes it even harder.

1

1 Answers

0
votes

I'm answering my own question in the hopes that it will save folks some time. In my scenario, I needed for clients to connect to a load balanced address like myrabbithost.mycompany.com, and for the F5 to direct traffic to one node as long as it's up and failover to the secondary node if it's down. I had already configured security and was authenticating to RabbitMQ using self-signed certificates. Those certificates had common names specific to each host which was the problem. In order to work with .NET, the common name on the certificate must match the server name being connected to (myrabbithost.mycompany.com in my case). I had to do the following:

  • Generate new server and client certificates on the RabbitMQ servers with common names of myrabbithost.mycompany.com
  • Generate a new certificates for the clients to use while connecting in order to use SSL authentication
  • Still on the RabbitMQ servers, I had to concatenate the multiple cacert.pem files used for the certificate authority so that clients can authenticate to any node using a client certificate generated by any node. When I modified rabbit.config to use the "all.pem" instead of "cacert.pem", clients were able to connect, but it broke the management UI, so I modified the rabbitmq_management settings in rabbit.config to specific the host-specific cacert.pem file and it started working again.

In order to set up high availability, I set up a RabbitMQ cluster, but ran into some problems there as well. In addition to copying the Erlang cookie from the primary node to the secondary node at C:\Windows and C:\users\myusername, I had to kill the epmd.exe process via task manager as the rabbitmqctl join_cluster command was failing with a "node down" error. The epmd.exe process survives RabbitMQ stoppages and it can cause rabbitmqctl.bat to report erroneous errors like "node down" even when it's not down.