Cassandra high availability

Question

I have a six node Cassandra cluster with NetworkTopologyStrategy and here is my schema:

Rack1
Cassandra-01
Cassandra-02

Rack2
Cassandra-03
Cassandra-04

Rack3
Cassandra-05
Cassandra-06

We use the CL=QUORUM and Replication factor 3 for Read/Write, so technically we tolerate a single RACK failure (loss of 2 nodes from rack).

For example, I write to Cassandra cluster (CL=QUORUM,RF=3) and Rack3 is going to be offline (hardware failure) and in total I have 4 nodes. Theoretically I should be able to write and read data to Cassandra because the Consistency level satisfied. But when I use the [Cassandra calculator] it says:

You can survive the loss of 1 node without impacting the application.

and

You can survive the loss of 1 node without data loss.

But why only 1 node?

Andrew Andrew · Accepted Answer · 2020-08-24T17:08:30

The calculator has no knowledge built into it about the rack aspect of the above configuration - so let's leave that for the moment. You have entered 6 nodes, RF 3 and Write / Read at Quorum.

If there were no racks involved (they are all in the same rack) - then the answers make sense.

Since writes were being made at Quorum, you can only guarantee 2 of the nodes will have the data at the point of write being acknowledged as successful, if immediately after writing 2 nodes they then failed, you could suffer data loss (because the 3rd did not get the data). Thus you can only tolerate 1 node loss without potential data loss in a worst case scenario.

You are correct to say that using NetworkTopologyStrategy with 3 racks, 2 nodes per rack - and using Quorum, you could lose an entire rack and still operate. Why does the calculation change?

Well, some of the calculation doesn't - while you can write at Quorum and Read at Quorum still, there is still the possibility of a node being read not having the data as yet, but it should read-repair and fix itself. (Assuming it is enabled on the table etc)

You shouldn't lose data though, since the rack aspect means you also have gained a further certainty that the 2 nodes in the same rack which went down did not have the same partitions appearing on both of them. So while 2 nodes are down, you did not eliminate 2 copies of the same partition - at least one node in another rack has the data (otherwise the quorum write would not of acknowledged)

If you follow the github link on the page itself, you can see the calculation for each of the values it provides in the html, for example:

 var dataloss = w - 1;
 $('#dataloss').text( dataloss > 1 ? dataloss+" nodes" : dataloss === 1 ? "1 node" : "no nodes");

w in this instance is the 'write' consistency level, when set to Quorum, is calculates w as being 2. There is no input for racks nor consideration of it in the code.

Cassandra high availability

2 Answers