Cassandra node fail within LOCAL_QUORUM margin, but client still getting failed CL errors

Question

I have a multi-DC (AWS regions) Cassandra cluster. A client program connects to one of the regions that has 4 nodes and RF=2. However, when only one node is down in that DC/region, the client keeps getting this error:

(com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_QUORUM (2 required but only 1 alive))

Here are more details:

The client program is Jmeter initially. But I verified with cqlsh and got the same kind of errors
The error (see above) happens roughly 50% of the time and happens to both read and write
Because there are 4 nodes and RF=2, I believe LOCAL_QUOROM=2, meaning the local ring can tolerate up to 2 nodes failing
But only one is down. And I verified by using "nodetool status"
Other consistency levels worked fine (e.g. TWO, THREE, QUORUM)
We use v-node for the cluster

I am having a hard time understanding what is happening: a local ring should have a complete copy of data. RF=2 should give me sufficient cushion against one node down. What did thing go wrong?

Marko Švaljek Marko Švaljek · Accepted Answer · 2017-01-26T22:55:42

4 nodes and RF=2

This means every data is on 2 nodes in the cluster.

with RF 2 quorum is actually 2 nodes. So if one node is down it's going to be missed on local quorums in about 50% of the times as you described.

RF is number of copies of the same data in the cluster, not the number of nodes you can loose.

Problem is in quorum, if you use consistency level one you are fine and can tolerate one node down.

Also have a look at this page:

https://www.ecyrd.com/cassandracalculator/

Also if not using local quorums, the client will go to other data centre to fetch the data (with TWO, THREE and so on)

Cassandra node fail within LOCAL_QUORUM margin, but client still getting failed CL errors

1 Answers