For clarification: BuckupableThing is some hardware device with program written in it (which is backed-up).
Updated clarification: This question is more about CQRS/ES implementation than about DDD modelling.
Say I have 3 aggregate roots:
class BackupableThing
{
Guid Id { get; }
}
class Project
{
Guid Id { get; }
string Description { get; }
byte[] Data { get; }
}
class Backup
{
Guid Id { get; }
Guid ThingId { get; }
Guid ProjectId { get; }
DateTime PerformedAt { get; }
}
Whenever I need to backup BackupableThing, I need to create new Project first and then create new Backup with ProjectId set to this new Project's Id. Everything is working as long as for each new Backup there's new project.
But really I need to create project if only it doesn't already exist, where unique id of existing project should be it's Data property (some kind of Hash of byte[] array). So when any other BackupableThing gets backed-up and the system sees that another BackupableThing has already been backed-up with the same result (Data) - show already created and working project with all descriptions and everything set.
First I thought of approaching this problem by encoding hash in Guid somehow, but this seems hacky and not straightforward, also it increases chances of collision with randomly generated Guids.
Then I came up with the idea of separate table (with separate repository) which holds two columns: Hash of data (some int/long) and PlcProjectId (Guid). But this looks very much like projection, and it is in fact going to be kind of projection, so I could rebuild it in theory using my Domain Events from Event Store. I read that it's bad to query read-side from domain services / aggregates / repository (from the write side), and I couldn't come up with something else in some time.
Update
So basically I create read-side inside the domain to which only domain has access. And I query it before adding new Project so that if it already exists I just use already existing one? Yes, I thought of it already over night, and it seems that not only I have to make such domain storage and query it before creating new aggregate, I also have to introduce some compensating action. For example, if multiple requests sent to create the same Project simultaneously, two identical projects would be created. So I need my domain storage to be an event handler and if user created the same project - I need to fire compensating command to remove/move/recreate this project using existing one...
Update 2
I'm also thinking of creating another aggregate for this purpose - aggregate for the scope of uniqueness of my project (in this specific scenario - GlobalScopeAggregate or DomainAggregate) which will hold {name, Guid} key-value reference. Separate GlobalScopeHandler will be responsible for ProjectCreated, ProjectArchived, ProjectRenamed events and will ultimately fire compensating actions if ProjectCreated event occurs with the same name which already has been created. But I am confused about compensating actions. How should I react if user has already made backup and has in his interface related view to the project? He can change description, name and etc. of wrong project, which already has been removed by compensating action. Also, my compensating action will remove Project and Backup aggregates, and create new Backup aggregate with existing ProjectId, because my Backup aggregate doesn't have setter on ProjectId field (it is immutable record of backup performed action). Is this normal?
Update 3 - DOMAIN clarification
There's number of industrial devices (BackupableThing, programmable controllers) on the wide network which have some firmware programmed in it. Customers update the firmware and upload it into the controllers (backupable things). This very program is gets backuped. But there's a lot of controllers of the same type, and it's very likely that customers will upload the same program over and over again to multiple controllers as well as to the same controller (as a means to revers some changes). User needs to repeatedly backup all those controllers. Backup is some binary data (stored in the controller, the program) and date of the backup occurrence. Project is some entity to encapsulate binary data as well as all information related to the backup. Given I can't backup program in the state that it was previously uploaded (I can only get unreadable raw binary data which I can also upload back into controller again), I require separate aggregate Project which holds Data property as well as number of attached files (for example, firmware project files), description, name and other fields. Now, whenever some controller is backed-up, I don't want to show "just binary data without any description" and force user to fill in all the descriptionary fields again. I want to look up if there's have already been done backup with the same binary data, and then just link this project to this backup so that user who backed-up another controller would instantly see lots of information regarding what's in this controller lives right now :)
So, I guess this is the case of set-based validation which occurs very often (as opposed to regular unique constraints), and also I would have lots of backups, so that separate aggregate which holds it all in the memory would be unwise.
Also I just thought there's another problem raises. I can't compute hash of binary data and tolerate small risk of two different backups be considered as the same project. This is industry domain which needs precise and robust solution. At the same time, I can't force unique constraint at binary data column (varbinary in SQL), because my binary data could be relatively big. So I guess I need to create separate table for [int (hash of binary data), Guid (id of the project)] relations and if hash of binary data of new backup is found, I need to load related aggregate and make sure binary data is the same. And if it's not - I also need some kind of mechanism to store more than one relation with the same hash.
Current implementation
I ended up creating separate table with two columns: DataHash (int) and AggregateId (Guid). Then I created domain service which has factory method GetOrCreateProject(Guid id, byte[] data). This method gets aggregate id by calculated data hash (it gets multiple values if there's multiple rows with the same hash), loads this aggregate and compares data parameter and aggregate.Data property. If they are equal - existing and loaded aggregate returned. If they are not equal - new hash entity added to hash table and new aggregate created.
This hash table is part of the domain now and now part of the domain is not event sourced. All future need for uniqueness validation (name of the BackupableThing, for example) would imply creation of such tables which add state-based storage to the domain side. This increases overall complexity and binds domain tightly. This is the point where I'm starting to ponder over if event sourcing even applies here and if not, where does it apply at all? I tried to apply it to simple system as a means to increase my knowledge and fully understand CQRS/ES patterns, but now I'm fighting complexities of set-based validation and see that simple state-based relational tables with some kind of ORM would be much better case (since I don't even need event log).
BackupableThing
has already been backed-up with the same result... is to compute the result anyway? Since it does not improve performance, what do you gain from that uniqueness rule then? I mean, it's not like a backup was a business object needing unique identification... – guillaume31