5
votes

I started to read about the Event-Sourcing pattern combined with CQRS. As far as I understand, the CQRS pattern is a pattern in which we separate the write and the read actions. Event-sourcing is a pattern where everything in the system is initiated by a command that triggers an event. Event-sourcing pattern requires an event bus. There are couple of things that I didn't manage to understand.

The Event store contains all the events that happened to a certain entity. If I want to query the current state of this entity, I need to query all the events that happened to this entity, and recreate its current state.
All the events history is present in the event store.
Why can't I have a microservice that is responsible for saving each event to a event-database (if I want to log those events for further actions. something like Kafka) and a separate microservice that updates the changes on an entity in a regular database (simple update to entity's document in MongoDB for example). When those microservices finish their work, this event will be removed from the event-store (let's say I implement this event store using a queue). In this way, whenever I need to query the current state of an entity, I simply query a database instead of querying the event-store and rebuilding the current state(or recalculating the state based on the event-store and caching the result periodically). I don't understand why it is mandatory to store all the events forever, why isn't it optional?

For example, a Lambda function that receives an event generates events and stores them in separate SQS for each event type. Each SQS has its own lambda function responsible for handling the corresponding event type. The event is removed once it is processed.

4

4 Answers

7
votes

Event-sourcing pattern requires an event bus.

A bus is not required for Event Sourcing, unless you need to notify other systems/domains of the change (event).

If I want to query the current state of this entity, I need to query all the events that happened to this entity, and recreate it's current state.

Well, sort of. You only need to do this when you are handling a new command, and need to validate that applying the command won't make the "entity" (as you call it) inconsistent. Note that this is involved in the command side of CQRS, not the query side.

For the query/read model side, you have a lot of different options. It's common when using Event Sourcing to have a separate data store that maintains a denormalized version of the event and related data that gets updated as the events happen. This separate store is often Eventually Consistent, which is too much to go into for the purpose of this answer. Your read model could also be a relational database, a flat file, or literally any other way of storing data you can think of. Its data is kept consistent with the write model by receiving events as they happen, via a bus, polling the database, or other means.

It is also absolutely valid to query the stream of events and process (or partially process) them in real time to construct a query, but the cases where this is necessary are relatively uncommon.

All the events history is present in the event store. Why can't I have a microservice that is responsible for saving each event to a event-database (If I want to log those events for further actions. something like kafka) and a separate microservice that updates the changes on an entity in a regular database (simple update to entity's document in mongodb for example).

You can!

When those microservices finished their work, this event will be removed from the event-store (lets say I implement this event store using a queue).

You can do this too, but then you're not doing Event Sourcing. This is more like "Event Driven Architecture" which is possible and completely valid without using Event Sourcing, but does not provide all the same benefits. In an event sourced system, the event store is the source of truth for the data, and a queue is not a valid place to store the truth, as it's not really meant to store data long-term.

When you do CQRS, and especially when you do Event Sourcing, you need to change your mental model of what "current state" means. The actual truth is stored somewhere (event store, relational database, etc.), and when you query you project that truth into whatever format you need it in.

For example, I have a database of users that stores FirstName in one column and LastName in another. The row that represents me has "Phil" in the FirstName column and "Sandler" in the LastName column. When I show the data in a UI, I display it as "Sandler, Phil". Why not just store it in a document database as "Sandler, Phil" and be done with it? Because by normalizing the data, I have accurately recorded the truth and have the option of projecting the data differently in the future should the need arise.

So is current state in the above example the data stored in the two columns, or is it "Sandler, Phil"? In CQRS you should not be thinking about it in terms of current state, but in terms of your two separate models, the truth (write side) and how it gets projected (the read side).

5
votes

As mentioned, Event Sourcing doesn't require a bus, it requires an event store.

The pattern you referred to (reading all events to reconstitute the entity state) is what I call "the Domain-Driven Design flavour" of Event Sourcing.

The thoughts you have are more related to "event+state"-oriented approach.

Let's look closer to both of those methods.

DDD and aggregate streams

One of the DDD tactical patterns is the Aggregate pattern. It is basically a consistency boundary. A command can only be applied to a single aggregate instance and therefore forms a transaction. When a command is handled, the aggregate state changes and therefore new domain event (or multiple events) is produced. We then store the event(s) in the event store as one transaction. All events for a single entity are stored in one stream, which we usually call the "aggregate stream" and the stream name usually is formed from the aggregate type and its id (like Order-123).

The aim here is what aggregates are meant for - consistency. The only way to be absolutely sure that you execute a command on the latest state of an aggregate is to read all the events (or a snapshot and all events after the snapshot).

I am not sure what you meant when mentioning "querying the entity state". If you mean "fetching the entity state by id" - that seems correct. For queries, you don't do it. That's where CQRS plays its role. You project necessary events to another place, a database that allows running queries. In that database, you have a projected state of your entity. There's no limitation that a projection only uses events from one entity type, it is actually more an anti-pattern. Read models (the projected state) are used for specific purposes, often driven by demand from users (UI of sorts).

Events+state

There are quite a lot of event-sourced systems out there that do exactly what you described - project the entity state to another storage, so you have a ready-made, easily accessible entity state all the time, without reading events over and over again.

It sounds attractive, but you must ensure that writing events and updating this snapshot happen transactionally. In the architecture that you described, when you have a function that projects events to a document database, it won't work. The entity state snapshot will always be eventually-consistent. Therefore, you can easily get to a situation when you execute a command, it operates on a stale entity snapshot and therefore you introduce some weird behaviour to the system. The worst part is that all your tests will be greed and it will happen in production when the system is loaded. Such errors are nasty and hard to catch.

Concerning other things, I believe that other answrs already cover those points.

3
votes

Event sourcing (with or without CQRS) specifically means storing the state of an entity, commonly using domain-specific events. When you needed to run business logic which requires data from that entity, you'd project the events in sequence on a state and use that.

It is an absolutely valid practice to store the domain events in something like Kafka, but store the entity itself (either by projecting the events on and then storing that or anything else) in a document or a normal form db, it's just not event sourcing.

I'll assume you know the benefits of event sourcing, so I'll not go over them here, but feel free to add a comment and I'll expand on those.

Why isn't storing the events in something like Kafka, and NOT use them during loading really event sourcing? If you're not storing snapshots in the same db as the events then you run the very real risks of concurrency conflicts appearing: for example double-entry, conflicting events being raised, OR have missing events if you decide to use at-most-once semantics on raising events. These directly mean that you can't really rely on the events you're emitting to ever be the source of truth.

2
votes

I don't understand why it is mandatory to store all the events forever, why isn't it optional?

Per Martin Fowler, (emphasis mine)

We can query an application's state to find out the current state of the world, and this answers many questions. However there are times when we don't just want to see where we are, we also want to know how we got there.

This leads to a number of facilities that can be built on top of the event log:

  • Complete Rebuild: We can discard the application state completely and rebuild it...
  • Temporal Query: We can determine the application state at any point in time...
  • Event Replay: If a past event was incorrect, we can compute the consequences by reversing it...

The reason events cannot be discarded is because the events themselves have value. If this is not the case, then Event Sourcing is a poor choice as it has many tradeoffs. The Event Sourcing pattern requires all events to be saved so they can be reused.