We are having MSMQ issues in a load balanced, high volume environment using NServiceBus.
Our environment looks as follows: 1 F5 distributing web traffic via round robin to 6 application servers. Each of these 6 servers uses a Bus.Send to 1 queue on a remote machine that resides on a cluster.
The event throughput during normal usage is approximately 5-10 per second, per server. So 30-60 events per second in the entire environment, depending on load.
The issue we're seeing is that 1 of the application boxes is able to send messages to the cluster queue, but the other 5 are not. Looking at the 5 boxes experiencing failure, the outgoing queue to the cluster is inactive.
There are also a high number of events in the transaction dead letter queue. When we purge that queue, the outgoing queue connects to the cluster, however, the messages grow as unacknowledged in the outgoing queue. This continues to grow until they move into the transaction dead letter queue again, and the outgoing queue changes state to inactive.
Interestingly, when we perform this purge operation, a different box will become the 'good box'. So we're pretty sure that the issue is not one bad box, it's that only 1 box at a time can reliably maintain a connection to the cluster queue.
Has anybody come across this before?