cassandra data purging for time series data based on timestamp column

Question

I am storing the time series data in cassandra on daily basis. We would like to archive/purge the data older than 2 days on daily basis. We are using Hector API to store the data. Can some one suggest me the approach to delete the cassandra data on daily basis where data is older than 2 days? Using TTL approach for cassandra row is not feasible, as the number of days to delete data is configurable. Right now there is no timestamp column in the table. we are planning to add timestamp column. But the problem is, timestamp alone cannot be used in where clause, as this new column is not part of primary key. Please provide your suggestion.

Is your model adapted/designed to something else? Because this doesn't look like a timeseries data in Cassandra: a timestamp like column should be part of the clustering key. — Cedric H.

Chris Lohfink Chris Lohfink · Accepted Answer · 2016-01-28T16:51:34

TTL is the right answer, there is an internal timestamp attached to every mutation that is used so you don't need to add one. Manually purging almost never a good idea. You may need to work on your data model a bit, check the datastax academy examples for time series

Also thrift has been frozen for two years and is now officially deprecated (removal in 4.0). Hector and other thrift clients are not really maintained anymore (see here). Using CQL and java driver will give better results with more resources available to learn as well.

cassandra data purging for time series data based on timestamp column

3 Answers