When does Cassandra remove data after it has been deleted?

Question

Recently I have been trying to familiarize myself with Cassandra but don't quite understand when data is removed from disk after it has been deleted. The use case I'm particularly interested is expiring time series data with DTCS. As an example, consider the following table:

CREATE TABLE metrics (
  metric_id text,
  time timestamp,
  value double,
  PRIMARY KEY (metric_id, time),
) WITH CLUSTERING ORDER BY (time DESC) AND 
     default_time_to_live = 86400 AND
     gc_grace_seconds = 3600 AND
     compaction = {
      'class': 'DateTieredCompactionStrategy',
      'timestamp_resolution':'MICROSECONDS',
      'base_time_seconds':'3600',
      'max_sstable_age_days':'365',
      'min_threshold':'4'
     };

I understand that Cassandra will create a tombstone for all rows inserted into this table after 24 hours (86400 seconds). These tombstones will first be written to an in-memory Memtable and then flushed to disk as an SSTable when the Memtable reaches a certain size. My question is when will the data that is now expired be removed from disk? Is it the next time the SSTable which contains the data gets compacted? So, with DTCS and min_threshold set to four, we would wait until at least three other SSTables are in the same time window as the expired data, and then those SSTables will be compacted into a SSTable. Is it during this compaction that the data will be removed? It seems to me that this would require Cassandra to maintain some metadata on which rows have been deleted since the newer tombstones would likely not be in the older SSTables that are being compacted.

Alternatively, do the SSTables which contain the tombstones have to be compacted with the SSTables which contain the expired data for the data to be removed? It seems to me that this could result in Cassandra holding the expired data long after it has expired since it's waiting for the new tombstones to be compacted with the older expired data.

Finally, I was also unsure when the tombstones themselves are removed. I know Cassandra does not delete them until after gc_grace_seconds but it can't delete the tombstones until it's sure the expired data has been deleted right? Otherwise it would see the expired data as being valid. Consequently, it seems to me that the question of when tombstones are deleted is intimately tied to the questions above. Thanks!

If it helps I've been experimenting with version 2.0.15 myself.

Will Will · Accepted Answer · 2016-06-17T08:00:24

There's two ways to definitly remove data in Cassandra.

1 : When gc_grace_seconds expires. In your table, gc_grace_seconds is set to 3600. wich means that when you execute a delete statement on a row. You will have to wait 3600 seconds before the data is definitly removed from all the cluster.

2 : When a compaction comes in. During a compaction, Cassandra looks for all the data marked with a tombstone and simply ignores it when writing the new SSTable to ensure that the new SSTable doesn't have already deleted data.

However, it might happen that a node goes down longer than gc_grace_seconds or during a compaction, you'll find more information in the Cassandra documentation.

When does Cassandra remove data after it has been deleted?

2 Answers