I'm having issues with a cassandra cluster with several datacenters, 3 nodes for each datacenter, 2 nodes per datacenter acting as seeds:
I have a keyspace X with ReplicationFactor 3 which has 3 copies in datacenter DC1 and 3 copies in datacenter DC2 (KEYSPACE X WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;
)
Now, what I do (and perhaps I'm missing something here) is I cqlsh into every node in datacenter DC2 (let's say node2A, node2B and node2C) and do the following:
- cqlsh node2N
- consistency all
- select * from x.table;
And by setting consistency to ALL, I know I have to get a response from every node, the 3 belonging to DC1 and the 3 belonging to DC2, 6 responses in total. But instead of that, I am getting 3 different results in each node:
- node2A: The query fails with a
Cannot achieve consistency level ALL info: {'required_replicas': 6, 'alive_replicas': 5, 'consistency': ALL}
- node2B: The query succeeds and returns the table data
- node2C: The query takes 1-2 minutes and then returns a
Coordinator node timed out waiting for replica nodes' responses. Operation timed out - received only 5 responses. info: {'received_responses': 5, 'required_responses': 6, 'consistency': ALL}
The reason why I'm doing these queries in cqlsh is because one of our applications is behaving erratically when querying cassandra (saying things such as not enough replicas for QUORUM, etc) and I suspect we might have some issue with the communications between the nodes. Either the gossiping is telling different things to different nodes or something like that. The communication works from each node to any other node (we can cqlsh, ssh and everything).
Could my theory be correct and we have some sort of incongruency in the configuration? If so, how could I debug those failures? Is there a way of knowing which one is the node not alive or not responding so that I can look more closely into its communications? I tried with "tracing on" but it only works for succesfull queries, so I only get the traces in the node2B (btw, the behaviour is not always the same on the same node, it seems to be random)
If not, is my cqlsh test even valid? Or am I missing some vital part of the cassandra puzzle here?
Much thanks in advance, I'm going mad in here....
EDIT: As requested, here's the output of the nodetool describecluster. I did it in all 3 nodes of the DC2 and:
- node2A:
Cluster Information:
Name: Cassandra Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
19ada8a5-4688-3fa8-9479-e612388f67ee: [node2A, node2B, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)]
- node2B:
Cluster Information:
Name: Cassandra Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
19ada8a5-4688-3fa8-9479-e612388f67ee: [node2A, node2B, node2C, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)]
UNREACHABLE: [couple of IPs from other datacenter/keyspaces]
- node2C:
Cluster Information:
Name: Cassandra Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
19ada8a5-4688-3fa8-9479-e612388f67ee: [node2B, node2C, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)]
UNREACHABLE: [node2A and other IPs]
Worth noting that in node2A there's no node2C, in node2B all 3 nodes appear and in node2C we have node2A as UNREACHABLE...
I sense this is very wrong, somehow...
I have just performed a "nodetool status keyspaceX" and this are the results:
- node2A:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN node2A 67,78 MB 256 100,0% - RAC1
UN node2B 67,18 MB 256 100,0% - RAC1
?N node2C 67,11 MB 256 100,0% - RAC1
- node2B:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN node2A 67,78 MB 256 100,0% - RAC1
UN node2B 67,18 MB 256 100,0% - RAC1
UN node2C 67,11 MB 256 100,0% - RAC1
- node2C:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN node2A 67,78 MB 256 100,0% - RAC1
UN node2B 67,18 MB 256 100,0% - RAC1
UN node2C 67,11 MB 256 100,0% - RAC1
Now, how come node2A doesn't know the state of node2C (it shows as ? and it didn't appear in the SchemaVersion of describecluster)? But why node2C which complained from node2A as UNREACHABLE in describecluster does know that node2A is Up, according to status?
nodetool describecluster
on one of the nodes and see if all node's schema is the same. Add the output to the post, too. – Soheil Pourbafrani