select older versions of data after update in Cassandra

Question

This is my use-case.

I have inserted a row of data in Cassandra with the following query:

INSERT INTO TableWide1 (UID, TimeStampCol, Value, DateCol) VALUES ('id1','2016-03-24 17:54:36',45,'2015-03-24 00:00:00');

I update one row to have a new value.

update TableWide1 set Value = 46 where uid = 'id1' and datecol='2015-03-24 00:00:00' and timestampcol='2016-03-24 17:54:36';

Now, I would like to see all versions of this data from Cassandra. I know in HBase, this is pretty straightforward, but in Cassandra, is this even possible?

I explored a bit using writetime(), but it just gives the latest time of the newly updated data. And this cannot be used in where clause too.

This is how my schema looks like:

CREATE TABLE TableWide1(
  UID varchar,
  TimeStampCol timestamp,
  Value double,
  DateCol timestamp,
  PRIMARY KEY ((UID,DateCol), TimeStampCol)
);

So is this technically possible, given the fact the old data still exists in Cassandra?

Nope, Cassandra does not keep the history of cells like other Big Table implementations do. — Ralf
You'll have to do it manually if you need it. I have such in one of my tables. Each time you create a new revision of a page in my CMS, I save it as a separate entry. That way I can access any version. — Alexis Wilke

Chris Lohfink Chris Lohfink · Accepted Answer · 2016-03-24T14:49:28

If your partitions wont get too wide you could exclude the time partitioning:

CREATE TABLE table_wide (
  UID varchar,
  TimeStampCol timestamp,
  Value double,
  PRIMARY KEY ((UID), TimeStampCol)
);

Thats generally bad though since eventually you will hit the limits of a partition.

But really you had it right. You wont be able to make a single statement, but under the covers you cant stream the entire set over anyway, and it will have to page through it. So you can just iterate through results of each day one at a time. If your dataset has days with no data and you dont want to waste reads, you can keep an additional table around to mark which days have data

CREATE TABLE table_wide_partition_list (
  UID varchar,
  DateCol timestamp,
  PRIMARY KEY (UID)
);

And make one query to it first.

Really if you want HBase like behavior for scans, you are probably looking for more OLAP style of thing instead of normal C* usage. For this its almost universally recommended to use Spark with Cassandra currently.

select older versions of data after update in Cassandra

2 Answers