Cassandra: How does the cluster handle dead nodes?

Question

i'm pretty new to cassandra and i'm not quiet sure if i understood everything correctly, so i hope somebody can help me.

this is my system:

cassandra 0.8
cluster with 3 nodes
a keyspace with a replication factor of 3
replication strategy: NetworkTopologyStrategy (all nodes in the same DC)
rails metal app that connects to the cluster using the twitter cassandra gem [1]
- read consistency: ONE
- write consitency: ANY

when one node goes down, i'm quiet sure of that:

i should be able to read records from the keyspace if i use a read consistency level of ONE.
i should be able to write to the keyspace with the write consistency level of ANY

this is what i don't understand:

the actions above succeede, but only if i manually remove the token of the dead node
shouldn't my cluster work as expected with a dead node? isn't this what cassandra is all about: high availability?

i dug around in the code of the gem and it looks like if the cassandra cluster tells the gem that it can find a record on the dead node (which is actually down). so the gem fails with an exception, that it can't connect to the dead node.

so i'm not sure if i have misunderstood something entirely, my cassandra setup is wrong or if the cassandra ruby gem is the problem (which i don't think).

thx, simon

[1] https://github.com/twitter/cassandra

nickmbailey nickmbailey · Accepted Answer · 2012-02-14T04:43:10

Yes your cluster should work as you described with a dead node.

I'm not familiar with ruby or the ruby client but it sounds more likely to me that your client is trying to send requests to the dead node which would cause a 'can't connect' type of exception. Cassandra would throw an UnavailableException if there weren't enough nodes up to meet the consistency requirements of a certain query.

What type of pooling does the ruby client do and are you sure it isn't attempting to send queries to a node that is down? Assuming the ruby client has some sort of connection pooling, it would likely have to see at least one failed query before it realizes a node is down.

Cassandra: How does the cluster handle dead nodes?

1 Answers