1
votes

I'm having issues with a cassandra cluster with several datacenters, 3 nodes for each datacenter, 2 nodes per datacenter acting as seeds:

I have a keyspace X with ReplicationFactor 3 which has 3 copies in datacenter DC1 and 3 copies in datacenter DC2 (KEYSPACE X WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;)

Now, what I do (and perhaps I'm missing something here) is I cqlsh into every node in datacenter DC2 (let's say node2A, node2B and node2C) and do the following:

  • cqlsh node2N
  • consistency all
  • select * from x.table;

And by setting consistency to ALL, I know I have to get a response from every node, the 3 belonging to DC1 and the 3 belonging to DC2, 6 responses in total. But instead of that, I am getting 3 different results in each node:

  • node2A: The query fails with a Cannot achieve consistency level ALL info: {'required_replicas': 6, 'alive_replicas': 5, 'consistency': ALL}
  • node2B: The query succeeds and returns the table data
  • node2C: The query takes 1-2 minutes and then returns a Coordinator node timed out waiting for replica nodes' responses. Operation timed out - received only 5 responses. info: {'received_responses': 5, 'required_responses': 6, 'consistency': ALL}

The reason why I'm doing these queries in cqlsh is because one of our applications is behaving erratically when querying cassandra (saying things such as not enough replicas for QUORUM, etc) and I suspect we might have some issue with the communications between the nodes. Either the gossiping is telling different things to different nodes or something like that. The communication works from each node to any other node (we can cqlsh, ssh and everything).

Could my theory be correct and we have some sort of incongruency in the configuration? If so, how could I debug those failures? Is there a way of knowing which one is the node not alive or not responding so that I can look more closely into its communications? I tried with "tracing on" but it only works for succesfull queries, so I only get the traces in the node2B (btw, the behaviour is not always the same on the same node, it seems to be random)

If not, is my cqlsh test even valid? Or am I missing some vital part of the cassandra puzzle here?

Much thanks in advance, I'm going mad in here....

EDIT: As requested, here's the output of the nodetool describecluster. I did it in all 3 nodes of the DC2 and:

  • node2A:

Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2A, node2B, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)]

  • node2B:

Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2A, node2B, node2C, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)] UNREACHABLE: [couple of IPs from other datacenter/keyspaces]

  • node2C:

Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2B, node2C, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)] UNREACHABLE: [node2A and other IPs]

Worth noting that in node2A there's no node2C, in node2B all 3 nodes appear and in node2C we have node2A as UNREACHABLE...

I sense this is very wrong, somehow...

I have just performed a "nodetool status keyspaceX" and this are the results:

  • node2A:

Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN node2A 67,78 MB 256 100,0% - RAC1 UN node2B 67,18 MB 256 100,0% - RAC1 ?N node2C 67,11 MB 256 100,0% - RAC1

  • node2B:

Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN node2A 67,78 MB 256 100,0% - RAC1 UN node2B 67,18 MB 256 100,0% - RAC1 UN node2C 67,11 MB 256 100,0% - RAC1

  • node2C:

Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN node2A 67,78 MB 256 100,0% - RAC1 UN node2B 67,18 MB 256 100,0% - RAC1 UN node2C 67,11 MB 256 100,0% - RAC1

Now, how come node2A doesn't know the state of node2C (it shows as ? and it didn't appear in the SchemaVersion of describecluster)? But why node2C which complained from node2A as UNREACHABLE in describecluster does know that node2A is Up, according to status?

2
run the command nodetool describecluster on one of the nodes and see if all node's schema is the same. Add the output to the post, too.Soheil Pourbafrani
I posted the output as you requested. The schema version UUID seems to be the same, but there are intriguing differences as you can see :/palvji

2 Answers

2
votes

It was related to an internal issue in cassandra. The gossip process was being shut down due to some corrupt hint file but the remainder cassandra processes were up and running so that node saw everybody else, but the rest were saying it was down due to Gossiper being down (the actual port 9160 was closed after the exception)

Exception screenshot

The actual cassandra issue is https://issues.apache.org/jira/browse/CASSANDRA-12728

Hope it is useful

1
votes

First thing, you can check whether any node is reachable or not you can run nodetool describe cluster and analyze the output.

The communication between node is happening through gossip and message exchange through port 7000 not through ssh or cqlsh.

About above 3 questions:-

  • When you ran the query it might possible any node was not reachable at that time and you did not achieve consistency as used ALL.

  • This time node was alive and achieve the consistency and you got the data.

  • In this case coordinator node did not get data from all node within time and through timeout exception. it can set on cassandra.yaml.

Hope answered your query.