Should event hubs be split on message type?

Question

I am considering using Azure event Hub for a project I am currently working on. We are using Service Bus Queues for commands today and here we are using one queue per messagetype.

Would it make sense to have several Event Hubs or is it better to use one hub for several message types?

What are your current reasons for using one queue per messagetype? Is because of the volume of events you're sending, or to make it easier to process the different messagetypes? How many different message types do you currently have? — Dominic Betts
Not all processes should process all messagetypes(commands). When we have one queue per messagetype each process can subscribe to the queues/messagetypes it cares about. This makes sense for the commands, but what we are thinking of using event hub for is Events. These events could potentially be processed by each process as opposed to the commands which should only be processed once. — Jonas Røineslien
In that case it sounds like you could use a single Event Hub. Each event consumer (process in your terminology) could use it's own Consumer Group that gives it it's own view of the event stream and process the events appropriately (for more details see msdn.microsoft.com/en-us/library/azure/dn836025.aspx). Note that a single Event Hub supports up to 20 Consumer Groups. — Dominic Betts

cacsar cacsar · Accepted Answer · 2015-03-20T10:40:09

This is a question full of tradeoffs and exercising judgement about what systems you expect to build now and in the future and how they might use the different event types.

Below is an excerpt from some of the guidance Jay Kreps has given for designing systems on top of Apache Kafka which applies well to Event Hubs as well (with the major exception of the limitations imposed by short retention periods and limitations on number of consumer groups).

Let’s begin with pure event data—the activities taking place inside the company. In a web company these might be clicks, impression, and various user actions. FedEx might have package deliveries, package pick ups, driver positions, notifications, transfers and so on.

These type of events can be represented with a single logical stream per action type. For simplicity I recommend naming the Avro schema and the topic the same thing, e.g. PageViewEvent. If the event has a natural primary key you can use that to partition data in Kafka, otherwise the Kafka client will automatically load balance data for you.

...

We experimented at various times with mixing multiple events in a single topic and found this generally lead to undue complexity. Instead give each event it’s own topic and consumers can always subscribe to multiple such topics to get a mixed feed when they want that.

I generally agree with this advice (and you should read that entire blog post if you're designing a system on Event Hubs/Kafka/Kinesis). Subscribers needing to ignore messages they aren't interested in is not only annoying, it becomes problematic later if one of the event types starts to dominate the combined stream.

But having multiple streams and combining them together does have costs, and they need to be weighed in making a decision. I've listed some that come to mind.

You lose ordering between events of different type from the same source unless you spend the effort to add it back.
If you want to commit progress to the different topics together then you need to manage them.
If you are partitioning the event streams on a primary key shared between the topics and want the partitions in each topic to travel together, you can't use the high level clients like EventProcessorHost as partitions can end up autobalanced to different processes.
A consumer with one thread per partition ends up multiplying the needed number of threads by the number of topics. Probably not an issue unless you have expensive structures that can't be shared.

In my own deployment we use different event hubs for different event types even though we currently use the same code to process them all. This is simply because I expect to add new components that only care about certain event types. I hope this helps, and at worst I've told you to go look at the guidance for Kafka since the principle's the same and it's been around longer.

Should event hubs be split on message type?

1 Answers