CQRS + EventSourcing scalability

Question

I'm trying to use CQRS and EventSorcing in my new project. I'm following the way that Greg Young suggested several years ago (Mark Nijhof implementation - http://cre8ivethought.com/blog/2009/11/12/cqrs--la-greg-young/). And I have some issues concerning scalability of this solution.

Some points were mentioned in this article by Mark Nijhof. But the problem now is the Denormalizer part, which is responsible for updating the reporting database. This part I want to make asynchronous, so after publishing events to the bus I want to return control immediately. We suggested that Denormalizer could be implemented as a standalone web service (WCF) which will process the incoming events and make updates to the report database in timing fashion with batches of commands. It seems that it could be a bottleneck, so we also want to add some scalability at this point - a cluster solution. But in case of cluster we can't control the sequence of reporting database updates (or we should implement some strange and I guess buggy logic which will check object versions in report DB). Another problem is sustainability of the solution: in case of failure we will loose updates in denormalizer, as far as we do not persist them anywhere). So now I'm lookig for solution of this problem (Denormalizer scalability) any thoughts are welcome!

Jonathan Oliver Jonathan Oliver · Accepted Answer · 2011-04-13T12:50:07

To start, you'll definitely want to have the denormalizer hosted in a separate process. From there you can have the domain publish to your messaging infrastructure the events that occur in the domain. One easy strategy to help speed up denormalization is to break things apart by message/event type. In other words, you could create a separate queue for each message type and then have the denormalizer subscribe (using a message bus) to the corresponding events. The advantage of this is that you don't have messages stacking up one behind the other--everything starts to run in parallel. The only places where you might have some contention is on tables that listen to multiple types. Even so, you've now distributed the load among many endpoints.

As long as you're using some kind of messaging infrastructure you won't loose the event messages when attempting to denormalize. Instead, after a certain number of failure retries the message will be considered "poison" and moved to an error queue. Simply monitor the error queue for problems. Once a message is in the error queue you can check your logs to see why it's there, fix the problem, and then move it back.

One other consideration is that Mark Nijhof's example is somewhat old. There are a number of CQRS frameworks available as well as mountains of advice in the DDD/CQRS Google Group.

CQRS + EventSourcing scalability

1 Answers