Cassandra clustering fail over-High Avialability

Question

I have configured a cassandra clustter with 3 nodes

Node1(192.168.0.2) , Node2(192.168.0.3), Node3(192.168.0.4)

Created a keyspace 'test' with replication factor as 2.

Create KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 2}

When I stop either Node2 or Node3 (one at a time and both at one time) , I am able to do the CRUD operations on the keyspace.table.

When I stop Node1 and try to update/create a row from Node4 or Node3, getting following error although Node3 and Node4 are up and running-:

All host(s) tried for query failed (tried: /192.168.0.4:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections))) com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /192.168.0.4:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)))

I am not sure how Cassandra elects a leader if a leader node dies.

There is no concept of leader in cassandra... check if you can telnet to host (192.168.0.4) on port 9042 — undefined_variable
Could you provide more information about the Consistency level used on queires (this has a huge impact on behavior you are expecting)? Are you using a driver or accessing using cqlsh? — Arthur Landim
@undefined_variable .... yes I am able to telnet from my local desktop to all the nodes on port 9042. — UAnand
@ArthurLandim .... I am using DBeaver Enterprise and connecting to the nodes by cassandra cql to execute my queries. — UAnand
@ArthurLandim.... The queries are listed below -: CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 2} CREATE TABLE test.emp( emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint, emp_phone varint ) INSERT INTO test.emp (emp_id, emp_name, emp_city, emp_phone, emp_sal) VALUES(11,'JOhn', 'Fort Worth', 434333333, 150000) — UAnand

Arthur Landim Arthur Landim · Accepted Answer · 2017-02-16T17:19:42

So, you are using replication_factor 2, so only 2 nodes will have a replica of you keyspace (not all the 3 nodes).

My first advise is to change the RF to 3.
You have to pay attention to the consistency level you are using; If you have only 2 copies of you data (RF: 2), and you are using Consistency Level QUORUM, it will try to write the data on half of nodes + 1, in this case, all 2 nodes. So if 1 node is down, you will not be able to write/read data.
to verify where the data is replicated you could see how is the ring in you cluster. As you are using SimpleStrategy it will copy the data clockwise direction. And in your case its copied at nodes at 192.168.0.2 and 192.168.0.3.
Take a look at the concepts of replication factor: http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html
And Consistency Level: http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
Great answer about RF vs CL: https://stackoverflow.com/a/24590299/6826860

You can use this calculator to find out if your setup have a decent consistency. In your case the result is You can survive the loss of no nodes without impacting the application

Cassandra clustering fail over-High Avialability

2 Answers