Cassandra - Clustering key and Row overwritten

Question

I have a challenge with respect to the design that i have choosen on my cassandra table. This is UP and running on production. But recently i have observed below issue.

(The table name and column here is for the sake of discussion)

create table items (listid int, 
  itemid int, 
  datatime timestamp, 
  dist int,
  primary key ((listid, itemid), datatime));

Lets say i get items from a sensor device in the following sequence (listid, itemid, datatime, dist)

row#1 (1, 101, 1583213040000, 50)
row#2 (1, 101, 1583213046000, 55)
row#3 (1, 101, 1583213046000, 40)
row#4 (1, 101, 1583213050000, 70)

When i insert the above data into my "items" table i can see only 3 records as below

row#1 (1, 101, 1583213040000, 50)
row#3 (1, 101, 1583213046000, 40)
row#4 (1, 101, 1583213050000, 70)

I am aware that the second row is replaced by third row as Partition and Clustering key values are same between these two rows.

Is there a way to retain row#2 and row#3? One possible way is to include "dist" as clustering key along with "datatime". But again this will not help when two rows from sensor comes with same timestamp and dist value.

My question is , Can anyone suggest a solution here without changing Datamodel design?

VHristov VHristov · Accepted Answer · 2020-05-24T15:27:50

As you correctly said, you could include the value in the clustering key, but the best way to have not overriding entries, is to make sure that the clustering key is unique. One way of achieving this is to use time based UUID, instead of a timestamp. This way, when reading you can also extract the timestamp from the UUID, and your columns will be sorted. Alternatively, you can add another column, where you add a small random string, to avoid collisions, which you can ignore when reading.

Cassandra - Clustering key and Row overwritten

2 Answers