How to scale out NServiceBus on either end?

Question

I try to get the overview of how NServiceBus scales out in various scenarios, but I have failed to find sufficient information out there.

Generally, it seems that scaling out NServiceBus (and the underlying transport) is different in a pub/sub scenario, than in a send/receive scenario. Furthermore, it seems to play a difference whether the message emitting system(s) are scaled out or the "absorbing" systems are scaled out.

Consider a setting with MSMQ as transport layer and NServiceBus on top, and assume that we will use the queue, MyQueue, as communication channel from an emitting system to an absorbing system (either via send/receive or pub/sub). How would a scale-out setup look like in the following scenarios? I have tried coming up with some answers below, but am not sure.

Emitting system unscaled; absorbing system unscaled: This is the trivial setup with one single endpoint at both ends, and one single queue at the receiver.

Subscription storage at both systems: MSMQ
Emitting system unscaled; absorbing system scaled: A single node at the absorbing system, the endpoint, is "promoted" as Distributor and a number of worker nodes gets messages from that single distributor node.

Does it make a difference if the absorbing system subscribes or receives?

And can the distributor be scaled out as well?

The documentation suggests using individual queue names, but isn't it messy if the sending/publishing systems should care about weather the receiving system is scaled out or not? From the sender's perspective, the receiver's scale-out should be transparent, I think.

Subscription storage at both systems: MSMQ? Where is the subscription stored? At the distributor?
Emitting system scaled; absorbing system unscaled: Multiple nodes at the emitting system can send to the unscaled absorbing system, just like in the unscaled/unscaled scenario above. But multiple nodes at the emitting system cannot (or should not) publish to the same absorbing system? Or what? How is publish from multiple nodes handled?

Subscription storage at both systems: MSMQ?
Emitting system scaled; absorbing system scaled: Is this just a combination of the above two cases for both the send/receive and pub/subscribe patterns?

Subscription storage at both systems: MSMQ?

In short, it could be nice to get an overview of how the recommended NServiceBus setup for the above four cases.

Are you limited to MSMQ? NServiceBus supports other transports as well? — Udi Dahan
No, we are not limited to MSMQ, as such, but I like the distributed (i.e. broker less) nature of MSMQ. Also, as MSMQ comes with windows it matches our customer's desire to prioritize standard Microsoft components. Would you recommend something else, though? — someName
There is the SQL Server transport if you want another option based on standard Microsoft software. That scales out by using the competing consumer pattern, avoiding the need for an explicit distributor completely. — janovesk
@UdiDahan - the transport support in NServiceBus is pretty poor in my opinion. It's amazing to see that you don't support MQ out of the box. In an enterprise environment this is a major issue. — Sean
@Sean I'm not sure that commenting here is the best place for us to have this discussion so I've created a GitHub issue to make this more visible and accessible: github.com/Particular/NServiceBus/issues/2815 — Udi Dahan

janovesk janovesk · Accepted Answer · 2015-08-07T10:57:04

First of all it's important to realize that the distributor component in NServiceBus is ONLY relevant if you are scaling out the MSMQ transport. The other supported transports are brokers and use the competing consumer pattern to achieve the same result.

Second you need to understand that NServiceBus makes a distinction between a logical autonomous component/endpoint, and the physical deployment of it. When you send, publish or subscribe you use the LOGICAL address of the endpoint ONLY. The logical address is often the same as the physical, but it does not have to be. There can be only one logical endpoint address for a service, but multiple physical deployments of it.

In a non-scaled scenario, it is the same thing. Let's say ServiceA and ServiceB are deployed to machine Machine1. The logical endpoint address and physical address for ServiceA would be ServiceA@Machine1. The logical address and physical address for ServiceB would be ServiceB@Machine1.

In a scaled out scenario, this would be different. Let's say you physically deploy ServiceA and ServiceB to Machine1 and Machine2. You also deploy two distributors for ServiceA and ServiceB to a third machine, MachineX. ServiceA now has one logical endpoint address, ServiceA@MachineX, but two physical addresses: ServiceA@Machine1 and ServiceA@Machine2. ServiceB also has one logical endpoint address, ServiceB@MachineX, and two physical addresses: ServiceB@Machine1 and ServiceB@Machine2.

Enter the distributor. It is a simple load balancer. It takes incoming messages from the logical endpoint of a service and forwards them to one of the physical endpoints/workers. It will do this in a round robin-ish way to distribute the load.

The trick to making this work is always to use the logical endpoint address of the component you want to reach when you send, subscribe or publish something. Never directly use the physical address of individual endpoint deployments.

Let's look at a simple send. ServiceA wants to send something to ServiceB. One of the physical endpoints of ServiceA, on Machine1, will create a message and put it into the logical address for ServiceB, ServiceB@MachineX. The message contains both the physical and the logical endpoint address of the sender. The distributor for ServiceB on MachineX will pick up the message and hand it off to either ServiceB@Machine1 or ServiceB@Machine2. That's it. That's all the distributor does.

Subscriptions work the same way. Say ServiceA wants to subscribe to one of ServiceB's events. One of the physical endpoints of ServiceA sends a subscription message to the logical endpoint address for ServiceB, ServiceB@MachineX. The message essentially contains the following "ServiceA would like to subscribe to event XXX". The distributor picks up the subscription message and hands it over to either ServiceB@Machine1 or ServiceB@Machine2. The lucky worker looks at the message and stores the fact that logical ServiceA would like to subscribe to event XXX. Note that it stores the logical endpoint address and NOT the endpoint address of the physical deployment of ServiceA that happened to send the subscription message. The subscription storage belongs to the logical ServiceB endpoint and is shared between all the physical deployments of it. This rules out MSMQ as storage. The supported stores are NHibernate or RavenDB. (A bit different for transports with native pub/sub-support, but that is beside the point.)

Publish is similar. ServiceB wants to publish the XXX event. One of the physical deployments of ServiceB looks into the subscription database for subscriptions to event XXX. It sees that logical endpoint address at ServiceA@MachineX is interested in that event. It then puts the event on that queue. The distributor for ServiceA on MachineX picks up the message and distributes it to one of the workers.

The distributor itself can not be scaled out, but it becomes a single point of failure if you don't make it highly available. The recommendation is to use a Windows Failover Cluster for the distributors and their MSMQ.

There is also the concept of the MasterNode. Look at it as Distributor and a Worker running in the same physical process to optimize resource usage on the machines hosting the Windows Failover Cluster. All the above still applies for the MasterNode.

How to scale out NServiceBus on either end?

2 Answers