Deleting column in cassandra for large dataset

Question

We have a redundant column that we'd like to delete from our Cassandra database (version 2.1.15). This is a text column represents the majority of data on disk (15 nodes X 1.8 TB per node).

The easiest option just seems to be an alter table to remove that column, and then let Cassandra compaction take care of things (also running Cassandra Reaper to manage repairs). However, given the size of the dataset I'm concerned I will knock over the cluster with a massive delete.

Other options I've consider is a process that will run through the keyspace setting the value to null, but I think this will have the same effect as removing the column, but is more under out control (but also requires writing something to do this).

Would anyone have any advice on how to approach this?

Thanks!

Pedro Vidigal Pedro Vidigal · Accepted Answer · 2018-07-31T16:40:31

Dropping a column does mark the deleted values as tombstones. The column value becomes unavailable immediately and the column data is removed in the next compaction cycle.

If you want to to expedite the removal of the column before the compaction occurs, you can run nodetool upgradesstables to remove the data, after you use the ALTER TABLE command to change the metadata for the column.

See Documentation: https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html

Deleting column in cassandra for large dataset

2 Answers