3
votes

For clarification: BuckupableThing is some hardware device with program written in it (which is backed-up).

Updated clarification: This question is more about CQRS/ES implementation than about DDD modelling.

Say I have 3 aggregate roots:

class BackupableThing
{
    Guid Id { get; }
}

class Project
{
    Guid Id { get; }

    string Description { get; }
    byte[] Data { get; }
}

class Backup
{
    Guid Id { get; }

    Guid ThingId { get; }
    Guid ProjectId { get; }
    DateTime PerformedAt { get; }
}

Whenever I need to backup BackupableThing, I need to create new Project first and then create new Backup with ProjectId set to this new Project's Id. Everything is working as long as for each new Backup there's new project.

But really I need to create project if only it doesn't already exist, where unique id of existing project should be it's Data property (some kind of Hash of byte[] array). So when any other BackupableThing gets backed-up and the system sees that another BackupableThing has already been backed-up with the same result (Data) - show already created and working project with all descriptions and everything set.

First I thought of approaching this problem by encoding hash in Guid somehow, but this seems hacky and not straightforward, also it increases chances of collision with randomly generated Guids.

Then I came up with the idea of separate table (with separate repository) which holds two columns: Hash of data (some int/long) and PlcProjectId (Guid). But this looks very much like projection, and it is in fact going to be kind of projection, so I could rebuild it in theory using my Domain Events from Event Store. I read that it's bad to query read-side from domain services / aggregates / repository (from the write side), and I couldn't come up with something else in some time.

Update

So basically I create read-side inside the domain to which only domain has access. And I query it before adding new Project so that if it already exists I just use already existing one? Yes, I thought of it already over night, and it seems that not only I have to make such domain storage and query it before creating new aggregate, I also have to introduce some compensating action. For example, if multiple requests sent to create the same Project simultaneously, two identical projects would be created. So I need my domain storage to be an event handler and if user created the same project - I need to fire compensating command to remove/move/recreate this project using existing one...

Update 2

I'm also thinking of creating another aggregate for this purpose - aggregate for the scope of uniqueness of my project (in this specific scenario - GlobalScopeAggregate or DomainAggregate) which will hold {name, Guid} key-value reference. Separate GlobalScopeHandler will be responsible for ProjectCreated, ProjectArchived, ProjectRenamed events and will ultimately fire compensating actions if ProjectCreated event occurs with the same name which already has been created. But I am confused about compensating actions. How should I react if user has already made backup and has in his interface related view to the project? He can change description, name and etc. of wrong project, which already has been removed by compensating action. Also, my compensating action will remove Project and Backup aggregates, and create new Backup aggregate with existing ProjectId, because my Backup aggregate doesn't have setter on ProjectId field (it is immutable record of backup performed action). Is this normal?

Update 3 - DOMAIN clarification

There's number of industrial devices (BackupableThing, programmable controllers) on the wide network which have some firmware programmed in it. Customers update the firmware and upload it into the controllers (backupable things). This very program is gets backuped. But there's a lot of controllers of the same type, and it's very likely that customers will upload the same program over and over again to multiple controllers as well as to the same controller (as a means to revers some changes). User needs to repeatedly backup all those controllers. Backup is some binary data (stored in the controller, the program) and date of the backup occurrence. Project is some entity to encapsulate binary data as well as all information related to the backup. Given I can't backup program in the state that it was previously uploaded (I can only get unreadable raw binary data which I can also upload back into controller again), I require separate aggregate Project which holds Data property as well as number of attached files (for example, firmware project files), description, name and other fields. Now, whenever some controller is backed-up, I don't want to show "just binary data without any description" and force user to fill in all the descriptionary fields again. I want to look up if there's have already been done backup with the same binary data, and then just link this project to this backup so that user who backed-up another controller would instantly see lots of information regarding what's in this controller lives right now :)

So, I guess this is the case of set-based validation which occurs very often (as opposed to regular unique constraints), and also I would have lots of backups, so that separate aggregate which holds it all in the memory would be unwise.

Also I just thought there's another problem raises. I can't compute hash of binary data and tolerate small risk of two different backups be considered as the same project. This is industry domain which needs precise and robust solution. At the same time, I can't force unique constraint at binary data column (varbinary in SQL), because my binary data could be relatively big. So I guess I need to create separate table for [int (hash of binary data), Guid (id of the project)] relations and if hash of binary data of new backup is found, I need to load related aggregate and make sure binary data is the same. And if it's not - I also need some kind of mechanism to store more than one relation with the same hash.

Current implementation

I ended up creating separate table with two columns: DataHash (int) and AggregateId (Guid). Then I created domain service which has factory method GetOrCreateProject(Guid id, byte[] data). This method gets aggregate id by calculated data hash (it gets multiple values if there's multiple rows with the same hash), loads this aggregate and compares data parameter and aggregate.Data property. If they are equal - existing and loaded aggregate returned. If they are not equal - new hash entity added to hash table and new aggregate created.

This hash table is part of the domain now and now part of the domain is not event sourced. All future need for uniqueness validation (name of the BackupableThing, for example) would imply creation of such tables which add state-based storage to the domain side. This increases overall complexity and binds domain tightly. This is the point where I'm starting to ponder over if event sourcing even applies here and if not, where does it apply at all? I tried to apply it to simple system as a means to increase my knowledge and fully understand CQRS/ES patterns, but now I'm fighting complexities of set-based validation and see that simple state-based relational tables with some kind of ORM would be much better case (since I don't even need event log).

3
So the only thing that can tell you if BackupableThing has already been backed-up with the same result... is to compute the result anyway? Since it does not improve performance, what do you gain from that uniqueness rule then? I mean, it's not like a backup was a business object needing unique identification...guillaume31
"I can't compute hash of binary data and tolerate small risk of two different backups be considered as the same project" - why? What would be the consequences?guillaume31
@guillaume31 if user backups some controller he expects to be able to restore the given backup in some time and controller should work as expected. Let's say for example after performing successful backup there was found already created project with the same data hash (but with another actual data). Not only user will see the wrong description, but if he chooses to upload "the same" program again to restore backup - this action will restore wrong program and controller begins to work wrong. These controllers control water pump stations, so this can lead to catastrofic results.EwanCoder
How does that relate to DDD? If your concerns are correct, you have a hash function entropy problem, not a DDD problem, right? I mean, if you find a another project that has the same hash as the backup you're currently doing, how will you know if it's the same data or not?guillaume31
I would load up related aggregate and check. It's better than have 2-MByte data field as unique key, that's an implementation detail. If I would go pure DDD and forget about performance problems, I would make my byte[] field into unique key and create new Project only if there's none with this byte[] field.EwanCoder

3 Answers

2
votes

You are prematurely shoehorning your problem into DDD patterns when major aspects of the domain haven't been fully analyzed or expressed. This is a dangerous mix.

  • What is a Project, if you ask an expert of your domain? (hint: probably not "Project is some entity to encapsulate binary data")
  • What is a Backup, if you ask an expert of your domain?
  • What constraints about them should be satisfied in the real world?
  • What is a typical use case around Backupping?

We're progressively finding out more about some of these as you add updates and comments to your question, but it's the wrong way around.

Don't take Aggregates and Repositories and projections and unique keys as a starting point. Instead, first write clear definitions of your domain terms. What business processes are users carrying out? Since you say you want to use Event Sourcing, what events are happening? Figure out if your domain is rich enough for DDD to be a relevant modelling approach. When all of this is clearly stated, you will have the words to describe your backup uniqueness problem and approach it from a more relevant angle. I don't think you have them now.

1
votes

No need to "query read-side" - as that is a bad idea. What you do is create a domain storage model for just the domain.

So you'll have the domain objects saved to EventStore and some special things saved somewhere else SQL, Key-Value, etc. Then a read consumer building your read models in SQL.

For instance in my app my domain instances listen to events to build domain query models which I save to riak kv.

A simple example which should illustrate my meaning. Queries are handled via a query processor, a popular pattern

class Handler : 
    IHandleMessages<Events.Added>,
    IHandleMessages<Events.Removed>,
    IHandleQueries<Queries.ObjectsByName>
{
    public void Handle(Events.Added e) {
       _orm.Add(new { ObjectId = e.ObjectId, Name = e.name });
    }
    public void Handle(Events.Removed e) {
       _orm.Remove(x => x.ObjectId == e.ObjectId && x.Name == e.Name);
    }
    public void Handle(Queries.ObjectsByName q) {
        _orm.Query(x => x.Name == q.Name);
    }

}
0
votes

My answer is quite generic as I'm not sure to fully understand you problem domain, but there's only 2 main ways to tackle set validation problems.

1. Enforce strong consistency

Enforcing strong consistency means that the invariant will be protected transactionnaly and therefore will never allow to be violated.

Enforcing strong consistency will most likely limit the scalability of your system, but if you can afford it then it may be the simplest way to go: preventing the conflict from occuring rather than dealing with the conflict after the fact is usually easier.

There are numerous ways strong consistency can be enforced, but here's two common ones:

  1. Rely on a database unique constraint: If you have a datastore that supports them and both, your event store and this datastore can participate in the same transaction then you can use this approach.

    E.g. (pseudo-code)

    transaction {
        uniquenessService.reserve(uniquenessKey); //writes to a DB unique index
    
        //save aggregate that holds uniquenessKey
    }
    
  2. Use an aggregate root: This approach is very similar to the one described above, but one difference is that the rule lives explicitely in the domain rather than in the DB. The aggregate will be responsible for maintaining an in-memory set of uniqueness keys.

    Given that the entire set of keys will have to be brought into memory every time you need to record a new one you should probably cache these kinds of aggregates in memory at all times.

    I usually use this approach only when there's a very small set of potential unique keys. It could also be useful in scenarios where the uniqueness rule is very complex in itself and not a simple key lookup.

Please note that even when enforcing strong consistency the UI should probably prevent invalid commands from being sent. Therefore, you could also have the uniqueness information available through a read model which would be consumed by the the UI to detect conflicts early.

2. Eventual consistency

Here you would allow the rule to get violated, but then perform some compensating actions (either automated or manual) to resolve the problem.

Sometimes it's just overly limiting or challenging to enforce strong consistency. In these scenarios, you can ask the business if they would accept to resolve the broken rule after the fact. Duplicates are usually extremely rare especially if the UI does validate the command before sending it like it should (hackers could abuse the client-side check, but that is another story).

Events are great hooks when it comes to resolve consistency problems. You could listen to events such as SomeThingThatShouldBeUniqueCreated and then issue a query to check if there are duplicates.

Duplicates would be handled in the way the business wants them to be. For instance, you could send a message to an administrator so that he can manually resolve the problem.


Even though we may think that strong consistency is always needed, in many scenarios it is not. You have to explore the risks of allowing a rule to get violated for a period of time with business experts and determine how often that would occur. Sometimes you may realize that there is no real risk for the business and that the strong consistency was artificially imposed by the developer.