16
votes

I am going through the research stage for a new project and I am currently having a debate with a work colleague about the architecture of the project.

We have agreed that we will create a distributed messaging system using CQRS and Event sourcing with azure. It will be an SPA with the front end using angular js and the backend will be Web API.

We have now discussed how the database will be set up and this is where the differences has occurred.

We have agreed that we will split the database into two databases, one for the read and write. My work colleague wants to use sql server for both the read and write databases as he has spent his whole career in sql and doesn't want to hear anything else but sql. I on the other hand have been researching about NoSql and I feel it would be suitable for the read database as it is better for performance.

As CQRS is all about eventual consistency I have read that NoSql databases are based on this as well which is now making me think about using NoSql for the write database as well.

We also planned on creating an events table for each aggregate root instead of one generic events table that contains all the events. As these tables are not relational it made me think why we should use sql server.

My questions are more of a what is best practice or general approach to how people create their event stores.

  1. Do you use one table for all events or create an events table per aggregate root?
  2. What would be the advantages/disadvantages of using a NoSql database for both the write and read databases for a CQRS application?
  3. Last but not least, how would you convince a stubborn work colleague who is married to Sql to convert to a NoSql approach?
3

3 Answers

16
votes

Note: My answer below is from 2014. Times have changed, and several NoSQL databases support multi-document transactions now. I'm still advocating PostgreSQL for storing events in particular because of its awesome JSON support.


The last question is primarily opinion based, so let answer the first two questions from my perspective.

Before doing so, let me say that were are using an SQL server (Postgres) for our event store. With CQRS+ES it is trivial to have one primary master for the writes to the database, have the repository read from the slaves (not to be confused with the CQRS reads). Scalability should not be a problem in this setup (if Reddit can scale with Postgres in master/slave, so can you).

I have done an extensive evaluation of the various options for NoSQL stores. At the end of the day, for an CQRS+ES setup I could not find any advantages over the SQL approach in master/slave mode that, IMO, would outweigh the disadvantages you'll get with a NoSQL databases.

In particular, it is trivial to implement optimistic locking in an SQL store (unique constraint on streamId and version fields), but it is quite hard (impossible?) to achieve reliable optimistic locking in simple NoSQL key-value stores.

With some document stores it is possible (I think there is a reason why NEventStore supports only RavenDB and MongoDB.):

  1. RavenDB supports transaction with ACID, but you'll have to use/deploy .NET, which is not an option for some (including us).
  2. MongoDB has server-side atomic operations that'll you need for appending events to the event stream AND bumping the version number in an atomic operation, but there are some technical limitations (particularly the maximum document size) that let me keep away from MongoDB as an event store. Also, I have read this article on InfoQ regarding partition tolerance, and Postgres appeared to be much more reliable than MongoDB.

In Postgres, I'm using one single event table for all events.

As for the second question, I think you have to distinguish between the event store and the read model. We are using an SQL server for the event store (for the reasons mentioned above), but for the CQRS read model I think NoSQL "databases" are a great fit, as the eventual consistency approach fits very well with the NoSQL paradigm. Also, one usually queries the read side by key only anyways, so you can use any key-value store that you like and fits your needs. We are not even using a database at all, but an in-memory grid that is re-built on demand.

6
votes

Personally, I prefer one aggregate per table, but this depends case by case. Even when using one table for all aggregates, partition tech could be used to alleviate potential performance issue.

I think NoSQL is feasible for both read and write side, but SQL database does provide an practical feature: Transactionality. This is especially useful when you need to commit several events in one transaction. For NoSQL database, you may have to change your schema to achieve this. For example, using per transaction per document when adopting Mongo. Or you could use Cassandra and make every row as an aggregate (one event per column, Cassandra provides partly transactionality for row level).

But this is not a big advantage if your program publishes only one event per transaction or Consistency is not a strong requirement.

2
votes

Just because you're not using FK (relationships), that makes SQL an 'overkill' for Event Sourcing. FKs are not mandatory in SQL.

NoSQL is a cool concept but in my personal experience I have found working with SQL way more easier in all aspects.

Do not overthink it too much. Use SQL. Leave your co-worker alone :) (kidding) and get the job done. At the end of the day most of our "headaches" is our inability to block the temptations of new technologies/experiences out there.

Does it make sense?