Dealing with aggregates that contain large collections in DDD/CQRS using DocumentDb or Event Sourcing

Question

I am working on a project that manage production of large number of documents in batches. The workflow

The user creates a new "Batch" using the application, based on a template that defines its requirements (requirements are usually files that user will have to upload and the system will process).
Once all requirements are met, the system will process all inputs and generate a large number of documents (thousands)
Those documents need to be post processed Just-In-Time
Some batch operations have to be done, for example, publishing all documents, in which case, all those documents will need to be post-processed first.
There are constraints on what operations can run simultaneously, each document can be post processed at most once, etc.

I have currently modeled the "Batch" itself as an aggregate root, but don't store the list of produced documents in the "Batch" object itself, but rather retrieve those documents from my data store using a collection id that was persisted in the "batch" object. The only reason I chose to do this was because I didn't want to make my aggregate root contain a large collection and become bloated, but this is getting in the way of developing the business logic, because now I have to deal with consistency issues across documents in the "batch".

My question is, in DDD/CQRS, when using a document database for persistence and/or when using event sourcing, how should one deal with aggregates that contain large collections?

I have seen this post and this post, neither addresses my concern, one uses nhibernate collection filters that is neither and option for me, nor do I think is the right way to deal with this issue since it leaks storage logic into the domain model, the other is more about accessing objects in nested aggregates and doesn't address storage/retrieval issues.

FYI, using .net, c#, service bus, using an oversimplified generic repository backed by SQL, planning to switch to MongoDb in very near future.

sij sij · Accepted Answer · 2014-10-29T12:34:54

I don't have a quick answer for you I'm afraid. I think it comes down to the design of your domain - specifically transactional and eventual consistency. It's hard to get out of the data orientated mindset. I have found Vaughn Vernon's follow up book to the original to be very useful. His book is based on his essays on Effective Aggregate Design. I found them to be very insightful in this area.

Dealing with aggregates that contain large collections in DDD/CQRS using DocumentDb or Event Sourcing

1 Answers