11
votes

I looked at a lot of event sourcing tutorials and all are using simple demos to focus on the tutorials topic (Event sourcing)

That's fine until you hit in a real work application something that is not covered in one of these tutorials :)

I hit something like this. I have two databases, one event-store and one projection-store (Read models) All aggregates have a GUID Id, what was 100% fine until now.

Now I created a new JobAggregate and a Job Projection. And it's required by my company to have a unique incremental int64 Job Id.

Now I'm looking stupid :) An additional issue is that a job is created multiple times per second! That means, the method to get the next number have to be really safe.

In the past (without ES) I had a table, defined the PK as auto increment int64, save Job, DB does the job to give me the next number, done.

But how can I do this within my Aggregate or command handler? Normally the projection job is created by the event handler, but that's to late in the process, because the aggregate should have the int64 already. (For replaying the aggregate on an empty DB and have the same Aggregate Id -> Job Id relation)

How should I solve this issue?

Kind regards

3
Here's a suggestion: use GUID as true ID and treat the integer ID as just another datum on the aggregate. Then create a chaser that would pick up aggregate creation events and generate new "assign ID" events for each of them. The chaser can own the sequence generator, and no bottleneck for anyone who doesn't care about the numeric ID.Fyodor Soikin
And how do I get the next one? (Incremented by 1) because the number just exists inside the events. I would have to reply all aggregates and then get the biggest one, but this is obviously no possible solution. And like I said, multible creations per second.SharpNoiZy
You really need a single source for your ids. The aggregate itself could do it, but then wouldn't be able to distribute it to other servers. Of course, a database could be that source, but I think that kinda defeats the purpose of making use of CQRS. You could write your own service, but it is very complicated because you're trying to write something that won't slow down your aggregate(s). I have found it's much better to address the need for an incremental ID--that need is simply at odds with distributed systems.Peter Ritchie
@SharpNoiZy you have a service that can provide a sequence of course. The chaser owns exclusive access to that sequence, takes IDs from it, and publishes them as events.Fyodor Soikin
What is the business requirement supposed to need this? Most of the time it comes from people with a little understanding of database and such but no architectural vision and are somewhat "dangerous" people who lead IT in the wrong directory by mixing their business requirement and their will to manage the project. Seriously take the time to discuss this with your colleague/manager/customer/whatever.Boris Guéry

3 Answers

4
votes

In the past (without ES) I had a table, defined the PK as auto increment int64, save Job, DB does the job to give me the next number, done.

There's one important thing to notice in this sequence, which is that the generation of the unique identifier and the persistence of the data into the book of record both share a single transaction.

When you separate those ideas, you are fundamentally looking at two transactions -- one that consumes the id, so that no other aggregate tries to share it, and another to write that id into the store.

The best answer is to arrange that both parts are part of the same transaction -- for example, if you were using a relational database as your event store, then you could create an entry in your "aggregate_id to long" table in the same transaction as the events are saved.

Another possibility is to treat the "create" of the aggregate as a Prepare followed by a Created; with an event handler that responds to the prepare event by reserving the long identifier post facto, and then sends a new command to the aggregate to assign the long identifier to it. So all of the consumers of Created see the aggregate with the long assigned to it.

It's worth noting that you are assigning what is effectively a random long to each aggregate you are creating, so you better dig in to understand what benefit the company thinks it is getting from this -- if they have expectations that the identifiers are going to provide ordering guarantees, or completeness guarantees, then you had best understand that going in.

There's nothing particularly wrong with reserving the long first; depending on how frequently the save of the aggregate fails, you may end up with gaps. For the most part, you should expect to be able to maintain a small failure rate (ie - you check to ensure that you expect the command to succeed before you actually run it).

In a real sense, the generation of unique identifiers falls under the umbrella of set validation; we usually "cheat" with UUIDs by abandoning any pretense of ordering and pretending that the risk of collision is zero. Relational databases are great for set validation; event stores maybe not so much. If you need unique sequential identifiers controlled by the model, then your "set of assigned identifiers" needs to be within an aggregate.

The key phrase to follow is "cost to the business" -- make sure you understand why the long identifiers are valuable.

1
votes

Here's how I'd approach it.

I agree with the idea of an Id generator which is the "business Id" but not the "techcnical Id"

Here the core is to have an application-level JobService that deals with all the infrastructure services to orchestrate what is to be done.

Controllers (like web controller or command-lines) will directly consume the JobService of the application level to control/command the state change.

It's in PHP-like pseudocode, but here we talk about the architecture and processes, not the syntax. Adapt it to C# syntax and the thing is the same.

Application level

class MyNiceWebController
{
    public function createNewJob( string $jobDescription, xxxx $otherData, ApplicationJobService $jobService )
    {
        $projectedJob = $jobService->createNewJobAndProject( $jobDescription, $otherData );

        $this->doWhateverYouWantWithYourAleadyExistingJobLikeForExample301RedirectToDisplayIt( $projectedJob );
    }
}

class MyNiceCommandLineCommand
{
    private $jobService;

    public function __construct( ApplicationJobService $jobService )
    {
        $this->jobService = $jobService;
    }

    public function createNewJob()
    {
        $jobDescription = // Get it from the command line parameters
        $otherData = // Get it from the command line parameters

        $projectedJob = $this->jobService->createNewJobAndProject( $jobDescription, $otherData );

        // print, echo, console->output... confirmation with Id or print the full object.... whatever with ( $projectedJob );
    }
}

class ApplicationJobService
{
    // In application level because it just serves the first-level request
    // to controllers, commands, etc but does not add "domain" logic.

    private $application;
    private $jobIdGenerator;
    private $jobEventFactory;
    private $jobEventStore;
    private $jobProjector;

    public function __construct( Application $application, JobBusinessIdGeneratorService $jobIdGenerator, JobEventFactory $jobEventFactory, JobEventStoreService $jobEventStore, JobProjectorService $jobProjector )
    {
        $this->application = $application;  // I like to lok "what application execution run" is responsible of all domain effects, I can trace then IPs, cookies, etc crossing data from another data lake.
        $this->jobIdGenerator = $jobIdGenerator;
        $this->jobEventFactory = $jobEventFactory;
        $this->jobEventStore = $jobEventStore;
        $this->jobProjector = $jobProjector;
    }

    public function createNewJobAndProjectIt( string $jobDescription, xxxx $otherData ) : Job
    {
        $applicationExecutionId = $this->application->getExecutionId();

        $businessId = $this->jobIdGenerator->getNextJobId();

        $jobCreatedEvent = $this->jobEventFactory->createNewJobCreatedEvent( $applicationExecutionId, $businessId, $jobDescription, $otherData );

        $this->jobEventStore->storeEvent( $jobCreatedEvent );       // Throw exception if it fails so no projecto will be invoked if the event was not created.

        $entityId = $jobCreatedEvent->getId();
        $projectedJob = $this->jobProjector->project( $entityId );

        return $projectedJob;
    }
}

Note: if projecting is too expensive for synchronous projection just return the Id:

        // ...
        $entityId = $jobCreatedEvent->getId();
        $this->jobProjector->enqueueProjection( $entityId );

        return $entityId;
    }
}

Infrastructure level (common to various applications)

class JobBusinessIdGenerator implements DomainLevelJobBusinessIdGeneratorInterface
{
    // In infrastructure because it accesses persistance layers.

    // In the creator, get persistence objects and so... database, files, whatever.

    public function getNextJobId() : int
    {
        $this->lockGlobalCounterMaybeAtDatabaseLevel();

        $current = $this->persistance->getCurrentJobCounter();
        $next = $current + 1;
        $this->persistence->setCurrentJobCounter( $next );

        $this->unlockGlobalCounterMaybeAtDatabaseLevel();

        return $next;
    }
}

Domain Level

class JobEventFactory
{
    // It's in this factory that we create the entity Id.

    private $idGenerator;

    public function __construct( EntityIdGenerator $idGenerator )
    {
        $this->idGenerator = $idGenerator;
    }
    
    public function createNewJobCreatedEvent( Id $applicationExecutionId, int $businessId, string $jobDescription, xxxx $otherData ); : JobCreatedEvent
    {
        $eventId = $this->idGenerator->createNewId();
        $entityId = $this->idGenerator->createNewId();
        
        // The only place where we allow "new" is in the factories. No other places should do a "new" ever.
        $event = new JobCreatedEvent( $eventId, $entityId, $applicationExecutionId, $businessId, $jobDescription, $otherData );

        return $event; 
    }
}

If you do not like the factory creating the entityId, could seem ugly to some eyes, just pass it as a parameter with a specific type and pss the responsibility to create a new fresh one and do not reuse one at some other intermedaite service (never the application service) to create it for you.

Nevertheless if you do so, pay care to what if a "silly" service just creates "two" JobCreatedEvent with the same entity Id? That would really be ugly. At the end, creation would only occur once, and the Id is created at the very core of the "creation of the event of JobCreationEvent" (reundant redundancy). Your choice anyway.

Other classes...

class JobCreatedEvent;
class JobEventStoreService;
class JobProjectorService;

Things that do not matter in this post

We could discuss much if the projectors shoud be in the infrastructure level global to multiple applications calling them... or even in the domain (as I need "at least" one way to read the model) or it belongs more to the application (maybe the same model can be read in 4 different ways in 4 different applications and each they have their own projectors)...

We could discuss much where are the side-effects triggered if implicit in the event-store or in the application level (I've not called any side-effects processor == event listener). I think of side-effects being in the application layer as they depend on infrastructure...

But all this... is not the topic of this question.

I don't care about all those things for this "post". Of course they are not negligible topics and you will have your own strategy for them. And you have to design all this very carefully. But here the question was where to crete the auto-incremental Id coming from a business requierement. And doing all those projectors (sometimes called calculators) and side-effects (sometimes called reactors) in a "clean-code" way here would blur the focus of this answer. You get the idea.

Things that I care in this post

What I care is that:

  • If the experts what an "autonumeric" then it's a "domain requirement" and therefore its a property in the same level of definition than "description" or "other data".
  • The fact they want this property does not conflict with the fact that all entities have an "internal id" in the format that the coder chooses, being an uuid, a sha1 or whatever.
  • If you need sequential ids for that property, you need a "supplier of values" AKA JobBusinessIdGeneratorService which has nothing to do with the "entity Id" itself.
  • That Id generator will be the responsible to ensure that once the number has been autoincremented, it is syncrhonously persisted before it's being returned to the client, so it is impossible to return two times the same id upon failures.

Drawbacks

There's a sequence-leak you'll have to deal with:

If the Id generator points to 4007, the next call to getNextJobId() will increment it to 4008, persist the pointer to "current = 4008" and then return.

If for some reason the creation and persistence fails, then the next call will give 4009. We then will have a sequence of [ 4006, 4007, 4009, 4010 ], with 4008 missing.

It was because from the generator point of view, 4008 was "actually used" and it, as a generator, does not know what you did with it, the same way than if you have a dummy silly loop that extracts 100 numbers.

Do never compensate with a ->rollback() in a catch of a try / catch block because that can generate you concurrency problems if you get 2008, another process gets 2009, then the first process fails, the rollback will break. Just assume that "on failure" the Id was "just consumed" and do not blame the generator. Blame who failed.

I hope it helps!

0
votes

@SharpNoizy, very simple.

Create your own Id Generator. Say an alphanumeric string, for example "DB3U8DD12X" that gives you billions of possibilites. Now, what you want to do is generate these ids in a sequencial order by giving each character an ordered value...

0 - 0
1 - 1
2 - 2
.....
10 - A
11 - B

Get the idea? So, what you do next is to create your function that will increment each index of your "D74ERT3E4" string using that matrix.

So, "R43E4D", "R43E4E", "R43E4F", "R43E4G"... get the idea?

Then when you application loads, you look at the database and find the latest Id generated. Then you load in memory the next 50,000 combinations (in case that you want super speed) and create a static class/method that is going to give you that value back.

Aggregate.Id = IdentityGenerator.Next();

this way you have control over the generation of your IDs because that's the only class that has that power.

I like this approach because is more "readable" when using it in your web api for example. GUIDs are hard (and tedious) to read, remember, etc.

GET api/job/DF73 is way better to remember than api/job/XXXX-XXXX-XXXXX-XXXX-XXXX

Does that make sense?