is Event sourcing using Database CDC considered good architecture?

Question

When we talk about sourcing events, we have a simple dual write architecture where we can write to database and then write the events to a queue like Kafka. Other downstream systems can read those events and act on/use them accordingly.

But the problem occurs when trying to make both DB and Events in sync as the ordering of these events are required to make sense out of it.

To solve this problem people encourage to use database commit logs as a source of events, and there are tools build around it like Airbnb's Spinal Tap, Redhat's Debezium, Oracle's Golden gate, etc... It solves the problem of consistency, ordering guaranty and all these.

But the problem with using the Database commit log as event source is we are tightly coupling with DB schema. DB schema for a micro-service is exposed, and any breaking changes in DB schema like datatype change or column name change can actually break the downstream systems.

So is using the DB CDC as an event source a good idea?

A talk on this problem and using Debezium for event sourcing

We can rather decouple both the source (say DB) and target systems (downstream) from the actual implementation. Adapters might just come in handy. In particular to Kafka landscape, we might choose to use Avro to handle version compatibility of different message schema. — Divs
Note that in general, questions about architecture are more topical at Software Engineering than here; Stack Overflow is more focused on specific, narrow, tactical issues. — Charles Duffy

David Szalai David Szalai · Accepted Answer · 2019-01-26T22:47:54

Extending Constantin's answer:

TLDR;

Transaction log tailing/mining should be hidden from others.

It is not strictly an event-stream, as you should not access it directly from other services. It is generally used when transitioning a legacy system gradually to a microservices based. The flow could look like this:

Service A commits a transaction to the DB
A framework or service polls the commit log and maps new commits to Kafka as events
Service B is subscribed to a Kafka stream and consumes events from there, not from the DB

Longer story:

Service B doesn't see that your event is originated from the DB nor it accesses the DB directly. The commit data should be projected into an event. If you change the DB, you should only modify your projection rule to map commits in the new schema to the "old" event format, so consumers must not be changed. (I am not familiar with Debezium, or if it can do this projection).

Your events should be idempotent as publishing an event and committing a transaction atomically is a problem in a distributed scenario, and tools will guarantee at-least-once-delivery with exactly-once-processing semantics at best, and the exactly-once part is rarer. This is due to an event origin (the transaction log) is not the same as the stream that will be accessed by other services, i.e. it is distributed. And this is still the producer part, the same problem exists with Kafka->consumer channel, but for a different reason. Also, Kafka will not behave like an event store, so what you achieved is a message queue.

I recommend using a dedicated event-store instead if possible, like Greg Young's: https://eventstore.org/. This solves the problem by integrating an event-store and message-broker into a single solution. By storing an event (in JSON) to a stream, you also "publish" it, as consumers are subscribed to this stream. If you want to further decouple the services, you can write projections that map events from one stream to another stream. Your event consuming should be idempotent with this too, but you get an event store that is partitioned by aggregates and is pretty fast to read.

If you want to store the data in the SQL DB too, then listen to these events and insert/update the tables based on them, just do not use your SQL DB as your event store cuz it will be hard to implement it right (failure-proof).

For the ordering part: reading events from one stream will be ordered. Projections that aggregates multiple event streams can only guarantee ordering between events originating from the same stream. It is usually more than enough. (btw you could reorder the messages based on some field on the consumer side if necessary.)

is Event sourcing using Database CDC considered good architecture?

3 Answers