Cassandra consistency issue?

Question

We have cassandra cluster in three different datacenters (DC1, DC2 and DC3) and we have 10 machines in each datacenter. We have few tables in cassandra in which we have less than 100 records.

What we are seeing - some tables are out of sync between machines in DC3 as compared to DC1 or DC2 when we do select count(*) on it.

As an example we did select count(*) while connecting to one cassandra machine in dc3 datacenter as compared to one cassandra machine in dc1 datacenter and the results were different.

root@machineA:/home/david/apache-cassandra/bin# python cqlsh dc3114.dc3.host.com
Connected to TestCluster at dc3114.dc3.host.com:9160.
[cqlsh 2.3.0 | Cassandra 1.2.9 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Use HELP for help.
cqlsh> use testingkeyspace ;
cqlsh:testingkeyspace> select count(*) from test_metadata ;

count
-------
    12

cqlsh:testingkeyspace> exit
root@machineA:/home/david/apache-cassandra/bin# python cqlsh dc18b0c.dc1.host.com
Connected to TestCluster at dc18b0c.dc1.host.com:9160.
[cqlsh 2.3.0 | Cassandra 1.2.9 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Use HELP for help.
cqlsh> use testingkeyspace ;
cqlsh:testingkeyspace> select count(*) from test_metadata ;

count
-------
    16

What could be the reason for this sync issue? Is it suppose to happen ever? Can anyone shed some light on this?

Since our java driver code and datastax c++ driver code are using these tables with CONSISTENCY LEVEL ONE.

What stragegy? However data written with CL ONE will be sent to replicas in other data centers so your situation should not happen. Check if DC3 can properly communicate with DC1|DC2 -- Try to perform some EACH_QUORUM writes to see if they fails. — Carlo Bertuccini

ashic ashic · Accepted Answer · 2014-08-20T18:36:39

What's your replication strategy? For cross datacenter, you should be looking at NetowrokTopologyStrategy with replications factors specified for each data center. Then during your queries, you can specify quorum / local quorum, etc. However, think about this for a minute:

You have a distributed cluster with multiple datacenters. If you want an each_quorum, think what your asking cassandra to do - for reads or writes, you ask it to quorum persist to both data centers separately before returning a success. Think about the latencies, and network connections going down. For a read, the client's requested node becomes the coordinator. It sends the write to the datacenter local replicas and to one node for the remote data centers. The recipient there coordinates to its local quorum. Once done, it returns results, and when the initial coordinator receives enough responses, it returns. All is well. Slow, but well. Now for writes, kind of a similar thing happens, but if a coordinator doesn't know that a node is down, it still sends to nodes. The write completes when the node comes back up, but the client can get a write timeout (note, not a failure - the write will eventually succeed). This can happen more often between multiple data centers.
Your looking to do count(*) queries. This is in general a terrible idea. It needs to hit every partition for a table. Cassandra likes queries that hit a single partition, or at least a small number of partitions (via IN filter).
Think about what select count(*) does in a distributed system. What does the result even mean? The result can be stale an instant later. There may be another insert in some other data center while you're processing the result of the query.

If you're looking to do aggregations over lots or all partitions, consider pairing cassandra with spark, rather than trying to do select(*) across data centers. And to go back to the earlier point, don't assume (or depend on) cross data center immediate consistency. Embrace eventual consistency, and design your applications around that.

Hope that helps.

Cassandra consistency issue?

2 Answers