1
votes

The Cassandra compaction process reduces the number of SSTables (data files on disk) used to store data. Minor compactions occur automatically. You can tell Cassandra to perform a major compaction using the nodetool compact command.

Does running nodetool compact merely perform one round of compaction, reducing the number of SSTables, but perhaps still resulting in there being several SSTables? Or does it always compact all the SSTables (of a column family) into one SSTable?

1

1 Answers

2
votes

It would depend on the compaction strategy you set for the table.

For DateTieredCompactionStrategy and LeveledCompactionStrategy, by definition I don't think even a major compaction would combine all the SSTables since that would go against the structure of SSTables they aim to create.

For the default SizeTieredCompactionStrategy, anecdotally it appears a major compaction will combine the SSTables into a single table. I ran cassandra-stress -write and watched the SSTables for a while. I could see the minor compactions combining SSTables of similar sizes, but not collapsing dissimilar sizes into one.

Then when I'd run a nodetool compact on the table, it would combine SSTables of dissimilar sizes into a single table. I'm not sure if that would be true in all cases.

Taking a quick look at the source, in CompactionManager.java it calls cfStore.getCompactionStrategy().getMaximalTask(gcBefore), which returns a list of tasks that it executes, so that kind of implies it will compact everything, but I didn't drill down any deeper than that.