Eventsourcing in Apache Kafka

Question

Using Kafka as an event store works fine, its easy to just set message retention to unlimited.

But I've seen some reports on Kafka being used for event sourcing also. And here is where I get confused on how that is possible. As an event store, I can just shove my messages in there. and consume or replay as needed.

But for event sourcing, you most likely want to read the events for a given entity/aggregate ID. You could of course use partitions, but that seems like abusing the concept and it would be hard to actually add new entities as the partition count is more on the static side, even if you can change it. Are there any sane solution to this out there? The Apache Kafka docs themselves only mention Event Sourcing briefly.

There is discussion around the use of Kafka as an event store here stackoverflow.com/questions/17708489/…. They've also got Kafka Streams which might be a better abstraction for this? — tomliversidge
Martin Kleppmann has a good talk about using Kafka in an event-driven architecture, if that can help. My take on it is that they were originally 2 quite different approaches and can take some work and concessions to be hybridized — guillaume31

Dmitry Minkovsky Dmitry Minkovsky · Accepted Answer · 2018-01-28T17:24:58

Regarding your comment on the other question:

Thanks for the effort here, the answer is pretty off topic though. "Is it what you want to express really?" No. The question is not about DDD nor CQRS. I'm well familiar with those. I'm asking how or if I can use Kafka for eventsouring. Lets say I have 10 million entities, I might not want to have them all loaded in memory across servers at once. Can I load the data for a single aggregate using Kafka w/o replaying everything ?

The answer is yes: you can use Kafka Streams to process the events. Your streams logic generates aggregates and stores them in local state stores (RocksDB), so that the resulting aggregates don't need to be in memory and can be accessed without replaying all the events. You can access these aggregates with the Interactive Queries API. It's quite nice! At this time, writing event processing logic that's replayable is easier said than done, but not impossible by any means.

Eventsourcing in Apache Kafka

2 Answers

Update: