Using Cassandra as an event store

Question

I want to experiment with using Cassandra as an event store in an event sourcing application. My requirements for an event store are quite simple. The event 'schema' would be something like this:

id: the id of an aggregate root entity
data: the serialized event data (e.g. JSON)
timestamp: when the event occurred
sequence_number: the unique version of the event

I am completely new to Cassandra so forgive me for my ignorance in what I'm about to write. I only have two queries that I'd ever want to run on this data.

Give me all events for a given aggregate root id
Give me all events for a given aggregate root if where sequence number is > x

My idea is to create a Cassandra table in CQL like this:

CREATE TABLE events (
  id uuid,
  seq_num int,
  data text,
  timestamp timestamp,
  PRIMARY KEY  (id, seq_num) );

Does this seem like a sensible way to model the problem? And, importantly, does using a compound primary key allow me to efficiently perform the queries I specified? Remember that, given the use case, there could be a large number of events (with a different seq_num) for the same aggregate root id.

My specific concern is that the second query is going to be inefficient in some way (I'm thinking about secondary indexes here...)

Now that its a year later, I'm curious to know how your event sourcing project using cassandra went. — Tim Jarvis
It seems logical that you also want all events in chronological order to rebuild query models. For that It would seem that cassandra is rather hard to handle. — Andrea Ratto
In the end I went with using Akka Persistence and the Cassandra journal plugin, thus delegating the schema decision making to the plugin, rather than design my own schema. Akka Persistence works incredibly well as a means to implement DDD using the actor model. By following a single aggregate root per persistent actor approach (single across a whole cluster), it ensures events are written chronologically. I recommend looking up Akka Cluster Sharding for details of ensuring a unique actor per aggregate root across an entire cluster. — DrewEaster

emgsilva emgsilva · Accepted Answer · 2013-10-11T15:57:12

Your design seem to be well modeled in "cassandra terms". The queries you need are indeed supported in "composite key" tables, you would have something like:

query 1: select * from events where id = 'id_event';
query 2: select * from events where id = 'id_event' and seq_num > NUMBER;

I do not think the second query is going to be inefficient, however it may return a lot of elements... if that is the case you could set a "limit" of events to be returned. If that is possible you can use the limit keyword.

Using composite keys seems like a good match for your specific requirements. Using "secondary indexes" do not seem to bring much to the table... unless I miss something in your design/requirements.

HTH.

Using Cassandra as an event store

6 Answers