1
votes

I was considering using cockroachdb for writing data in 3rd Normal Form with ACID guarantees. So all writes will get diverted to cockroachdb.

The reads could all be point lookup based on rowkey to Cassandra. I believe such a read setup will eliminate need for redis caching since Cassandra will give fast reads by itself. So Cassandra tables will be denormalized based on access paths.

There could be event based sync from insert/update/delete inside cockroachdb normalized schema to insert/update/delete to cassandra denormalzied schema.

Question 1 :

Does this read / write separation fit into a valid usecase for using cockroachdb ? The intention is to reduce joins and have fast reads as well as writes. Cockroachdb becomes single source of truth ingesting also Event sourcing kind data. And other databases like cassandra and elasticsearch become query projections that are kept in sync eventually.

Question 2 :

Does this setup fit with financial transactions where N statements need to be done atomically ? From my understanding, lets assume there are N SQL statements that are done transactionally inside cockroachdb 3NF schema. After this, the reads happen from Cassandra/ElasticSearch which will be not yet in sync due to sync latency. In this eventual consistency scenario, if user sends another command to achieve same result from other machine in parallel, this will go to command handler which will lookup in cockroachdb. I think since CockroachDb is ACID compliant, we will be assuringly invalidating command during command validation step after lookup to cockroachdb. I believe in this cockroachdb will throw optimistic lock exception since one transaction writing to same table is already in progress. So the question is - in such scenarios, should we read as well from CockroachDB instead of Cassandra / ElasticSearch ?

Question 3

Last usecase I had in mind was to have cockroachdb serve the role what a spark cluster will do with cassandra with respect to aggregations. We can do aggregation inside cockroachdb which has all data and store in pre-aggregated tables in cassandra. Though ElasticSearch is also capable of doing aggregation, here is question - does this usecase also sounds correct w.r.t using cockroachdb instead of elasticsearch for aggregation ?

1

1 Answers

2
votes

As a general guideline, I would recommend designing the system from scratch instead of starting with such a complex architecture. If you start with CockroachDB as your "single source of truth", how far can you go with CockroachDB alone? Do you have performance requirements that can only be met with a caching layer? Do you need a separate system for aggregation/reporting? If the answer is "yes", then you can start thinking about what form those components should take.

There could be event based sync from insert/update/delete inside cockroachdb normalized schema to insert/update/delete to cassandra denormalzied schema.

Note that CockroachDB does not yet have a good way to stream updates into an external system, so this won't be easy to do.

For your specific questions:

  1. A read cache can be a valuable addition to a system, but it also adds a lot of complexity so don't introduce one until you know you need it. You can denormalize things in a SQL database too, and features like CockroachDB's interleaved tables can reduce the need for denormalization.

  2. You only have transactional guarantees for reads that go to CockroachDB in a transaction. The exact behavior here is going to depend on how your transaction is written. For example, two "add a comment" transactions could both apply without conflicting with each other, depending on your schema. You may need to guard against this by giving things appropriate unique IDs or doing a SELECT at the beginning of your transaction to ensure that the state of the database is as you expect. (Also, don't make too many assumptions about "optimistic lock exceptions". CockroachDB's concurrency is a mix of optimistic and pessimistic models)

  3. Again, it depends. ElasticSearch can do a lot of things that CockroachDB can't, and CockroachDB doesn't (yet) do much pre-aggregation. But SQL is a very flexible language for aggregation and reporting, so you may be able to do what you need in CockroachDB.