I have a multi-DC (AWS regions) Cassandra cluster. A client program connects to one of the regions that has 4 nodes and RF=2. However, when only one node is down in that DC/region, the client keeps getting this error:
(com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_QUORUM (2 required but only 1 alive))
Here are more details:
- The client program is Jmeter initially. But I verified with cqlsh and got the same kind of errors
- The error (see above) happens roughly 50% of the time and happens to both read and write
- Because there are 4 nodes and RF=2, I believe LOCAL_QUOROM=2, meaning the local ring can tolerate up to 2 nodes failing
- But only one is down. And I verified by using "nodetool status"
- Other consistency levels worked fine (e.g. TWO, THREE, QUORUM)
- We use v-node for the cluster
I am having a hard time understanding what is happening: a local ring should have a complete copy of data. RF=2 should give me sufficient cushion against one node down. What did thing go wrong?