1
votes

Following case may be logical correct for Cassandra, but difficult for user. Let's say:

Cassandra consistency level: write all, read one replication_factor:3

For one record, rowkey:001, column:status

  1. Client 1, insert value for rowkey 001, status:True, timestamp 11:00:05
  2. Client 2 Slice Query, get the value True for rowkey 001, @11:00:00
  3. Client 2, update value for rowkey 001, status:False, timestamp 11:00:02

So the client update sequence is True to False, although the update requests are from different nodes, but the sequence are logically ordered.

But the result is rowkey:001, column:status, value: True

So why Cassandra so depend on client local time? Why not using server localtime instead client local time?

Because I am using consistency level write all, and replication_factor:3, so for all the 3 nodes, the update sequence is correct (True -> False), they can give a correct final results.

If for some reason, it need strong depends on operation's timestamp, then query operation also need a timestamp, then Client 2 will not see the value True, which happen in "future".

So either using server timestamp or require timestamp for query also (which means, the second step query will not see the result, because the data is in the "future"), it will be more consistent.

Otherwise, the consistency of Cassandra is so weak, even R + W > N.

2
From above, I think the reason why you can get value True after step 2 is that Cassandra does inserts based on the sequence not so on the timestamps. Otherwise, you shouldn't have seen the True value, which means that Cassandra doesn't so depend on client local time.WeiHao

2 Answers

4
votes

The short answer is that CQL actually does default to server-provided timestamps.

As a longer answer, I wrote a post about the role of timestamps in conflict resolution at http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks.

0
votes

CQL uses server side timestamps, but the legacy Thrift interface uses client timestamps.

Note what you describe is not a consistency problem, since all responses will be consistent with each other after the write. It is a violation of causality though. Even with server side timestamps, you may get problems with simultaneous writes to the same columns.

A discussion of some of the issues is here: http://aphyr.com/posts/294-call-me-maybe-cassandra