2
votes

Cassandra's MV are not production ready:

  1. Cassandra Materialized views impact
  2. Limitations: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/knownLimitationsMV.html
  3. https://techblog.fexcofts.com/2018/05/08/cassandra-materialized-views-ready-for-production/:

It turns out there have been issues with MVs. The biggest issue being the MV not keeping in sync with the base table. This seems to occur when creating a MV with a key that is not a key of the base table. Cassandra does not offer any mechanism for checking the integrity between the base table and any MVs. So unless you do this manually, you will be oblivious to any discrepancies. If you do find any discrepancies, the only way to fix them is to drop and recreate the MV.

Cassandra's has MV since 2015, already 5.5 years: https://www.datastax.com/blog/new-cassandra-30-materialized-views.

Over to ScyllaDB, a database which first version was released in 2016: https://www.scylladb.com/2016/03/31/release-1-0/. ScyllaDB promotes MVs as production ready.

Why isn't Cassandra able to create production ready MVs like ScyllaDB can? I don't see any limitations for MVs on ScyllaDB on their website. MVs are super useful and I don't understand Cassandra never succeeded production ready MVs, this issue is already open for over 5 years: https://issues.apache.org/jira/browse/CASSANDRA-10346.

How did ScyllaDB solve the inconsistent MV problem? Why can't/haven't Cassandra solved the MV problems?

2

2 Answers

4
votes

The Scylla implementation of MV resembles the Cassandra one but isn't identical. Today, even with Scylla, if the view and the base go out of sync, there is no 100% safe way to fix it other than a complete view rebuild. However, we fixed and improved MV implementation a lot and decreased the chances of this happening. Here's a capture of the current closed:open bugs with MV: https://github.com/scylladb/scylla/issues?q=is%3Aissue+is%3Aopen+materialize+view+

The development pace at Scylla is higher and so is the amount of active committers (as surprising as it is). Nowadays we're working on Raft in order to make regular operations consistent and thus completely sync the view and the base for good.

2
votes

Full disclosure - I work on the Scylla project.

I don't know who could answer definitively "why hasn't Cassandra done x". This is Open Source Software, so I think the best answer is that nobody in the Community cared enough to do anything about it. And if you care enough, you are welcome to fix it. Maybe there's a better answer, but that's what I've got for you.

As for the "how Scylla did it", there is a detailed technical talk on the implementation of indexing and materialized views at https://www.youtube.com/watch?v=dyWZRjtPI2s. The talk is a bit old - there are recent updates to both functionality and performance - but the underlying infrastructure is all there.

And I can confirm that 2i and MV in Scylla are very widely used at scale in Production.