5
votes

I'm building a system thats very data centric. I have large hierarchical datasets but no business rules. The output of the system comes from some calculations done on the data and a number of reports. I need to have a complete audit trail (for regulatory reasons) and be able to run the calculations against a dataset from any point in it's past.

For these reasons I thought having an event sourced system using CQRS was the way to go. All the examples I've seen revolve around creating aggregates to do ES. The problem I have is because each piece of data is one large related set I'd have a small number of massive aggregates. The alternative seemed to be splitting the set up into it's parts and calling each one an aggregate. But, in order to do any calculation I would have to load hundreds of thousands of aggregates.

My question is, does anyone have experience of CQRS + ES systems that are data centric and what that might look like?

Is there a better way to store the history of a dataset without using ES?

Thanks for any help.

2
Hard to answer without having any details of how big the data set is. How the calculations happen, etc.Tomasz Jaskuλa
It's an asset management system. Each Asset has 100k+ pieces of equipment. Each Asset also has a number of projects related to it. Each project has a hierarchy of 1k+ items for each piece of equipment in the asset. The calculations are run against a project and need the equipment (100k+ items) plus all the data held against each one in the project.Colin

2 Answers

9
votes

But, in order to do any calculation I would have to load hundreds of thousands of aggregates.

Language check: aggregates only exist in the write model (C). Calculations and reports come out of the read model (Q). You aren't, after all, changing/appending to the event history when you report on it.

It's an asset management system. Each Asset has 100k+ pieces of equipment.

That sounds a bit like an inventory tracking system. Greg Young has remarked that "in most inventory systems there are no commands."

Because the "book of record" is the real world, not the model, "commands" don't make sense -- the model isn't allowed to reject reality. Without commands, aggregates go away; there are no business rules to enforce. Just events that announce changes to the real world.

The basic pattern of CQRS+ES still works, which is to say that you write a history of events into your persistence layer (that's your audit trail), and publish events out of this record, so that your other projections can update.

You'll need to consider how many event streams are appropriate for your situation. In CQRS solutions where the domain model is the book of record, each aggregate normally writes to an exclusive event history (reducing contention); models that need data from more than one stream join them together. So you might want to do something analogous for your different external event sources. Alternatively, you might have them all publish into a single event stream, and then have the read models filter out the events that they don't need.

2
votes

Since the times I've familiarized myself with event-sourcing ideas, I'm always using event store to store things that happen in systems that I'm working with. I call it 'event sourcing lite', when you don't really building aggregates but following anemic model route, by just putting all logic in Application Services layer (like in Onion).

I rarely see reasons not to follow "event sourcing" in it's 'lite' version. It's like audit+logging, but with much better scope of applications, as your code grows. Only if your domain is rich, you may consider start building aggregates and snapshots, caching them in memory, etc. For shallow domains you can also use aggregates if you require max performance and huge loads. Building ES aggregates correctly requires a skill and time for analysis and experimentation. Make sure you have it, before starting this venture.