2
votes

This is my use-case.

I have inserted a row of data in Cassandra with the following query:

INSERT INTO TableWide1 (UID, TimeStampCol, Value, DateCol) VALUES ('id1','2016-03-24 17:54:36',45,'2015-03-24 00:00:00');

I update one row to have a new value.

update TableWide1 set Value = 46 where uid = 'id1' and datecol='2015-03-24 00:00:00' and timestampcol='2016-03-24 17:54:36';

Now, I would like to see all versions of this data from Cassandra. I know in HBase, this is pretty straightforward, but in Cassandra, is this even possible?

I explored a bit using writetime(), but it just gives the latest time of the newly updated data. And this cannot be used in where clause too.

This is how my schema looks like:

CREATE TABLE TableWide1(
  UID varchar,
  TimeStampCol timestamp,
  Value double,
  DateCol timestamp,
  PRIMARY KEY ((UID,DateCol), TimeStampCol)
);

So is this technically possible, given the fact the old data still exists in Cassandra?

2
Nope, Cassandra does not keep the history of cells like other Big Table implementations do. - Ralf
You'll have to do it manually if you need it. I have such in one of my tables. Each time you create a new revision of a page in my CMS, I save it as a separate entry. That way I can access any version. - Alexis Wilke

2 Answers

-1
votes

If your partitions wont get too wide you could exclude the time partitioning:

CREATE TABLE table_wide (
  UID varchar,
  TimeStampCol timestamp,
  Value double,
  PRIMARY KEY ((UID), TimeStampCol)
);

Thats generally bad though since eventually you will hit the limits of a partition.

But really you had it right. You wont be able to make a single statement, but under the covers you cant stream the entire set over anyway, and it will have to page through it. So you can just iterate through results of each day one at a time. If your dataset has days with no data and you dont want to waste reads, you can keep an additional table around to mark which days have data

CREATE TABLE table_wide_partition_list (
  UID varchar,
  DateCol timestamp,
  PRIMARY KEY (UID)
);

And make one query to it first.

Really if you want HBase like behavior for scans, you are probably looking for more OLAP style of thing instead of normal C* usage. For this its almost universally recommended to use Spark with Cassandra currently.

-2
votes

Cassandra does not retain old data when updated. It marks the old data into tombstone, and get rid of this, when compaction happens.

Hbase, was not made for handling real time application, and hot data from/for application server, though things have improved since the old times with Hbase. People use Hbase, mainly because they already have a hadoop cluster.

Another noticeable and important difference is Cassandra is very fast on retrieval of single/multiple record based on key but not on range like >10 && <10 because data is stored based on hashed key. Hbase on the other hand stores data in sorted manner and is ideal candidate for range query.

Anyways, since cassandra doesn't retain old data. You cannot retrieve it.