Performance issues with Cassandra MaterializedViews

Question

We have created a cassandra cluster with 9 nodes. Each one is equipped with 4Cores and 16G RAM. We are writing 15-25 Million records with 28 columns.

The data model we have designed is as follows ( i just renamed the columns and shortened the actual schema for making it brief).

CREATE TABLE main_table(
col1 ... col28,
PRIMARY KEY((col1,col2),col_date,col_with_some_seq_number))
WITH CLUSTERING ORDER BY (col_date DESC,col_with_some_seq_number desc) AND  default_time_to_live = 5270400;

CREATE MATERIALIZED VIEW mv_for_main_table AS
SELECT [col1.. col11],
FROM main_table 
WHERE col1 IS NOT NULL AND col2 IS NOT NULL AND col_date IS NOT NULL AND col_with_some_seq_number IS NOT NULL
PRIMARY KEY ((col1),col2, col_date, col_with_some_seq_number)
WITH CLUSTERING ORDER BY (col_date DESC, col_with_some_seq_number DESC, col2 DESC);

Its just moving one of the partition key to clustering key in materialized view.

We are loading the data from spark and do not modified any cassandra related configurations.

After ingesting around 150 Million records, the ingestion started failing and each node is giving lot of mutation failures.

Is there any performance issues with materialized views.? or the definition i have used is not efficient.?

We have tried few changes to configuration such as reducing the concurrent writes,throughput MB. After all the tries, we have dropped materialized view and then every thing started working well.

We have done enough testing to conclude that only after materialized view inclusion the writes are getting slow by huge margin and mutations are getting dropped.

We are planning to have separate tables instead of materialized views for the above configuration, but i want to know if there is any mistake with the materialized views or data model that we have used.

doanduyhai doanduyhai · Accepted Answer · 2016-12-08T14:53:48

One place to understand materialized views (MV) in depth: http://www.doanduyhai.com/blog/?p=1930

There is a lock on a partition of the base table when having MVs. This local lock has a cost (see in my blog post)

I have also another remark about your hardware sizing, 4CPUs is below the official recommendation which is 8 CPUs: http://cassandra.apache.org/doc/latest/operating/hardware.html

Write workload in Cassandra is CPU-bound.In your case your CPU is also used by Spark, that may explain your bottleneck.

Please post here a screen capture of dstat and htop

Performance issues with Cassandra MaterializedViews

2 Answers