Cassandra 2.1 speed up full compaction

Question

I have a Cassandra 2.1 cluster using Leveled Compaction Strategy.

Base on my calculation, the cluster will run out of space before compaction kick in automatically when it reaches next level. For that reason, I have a cron job that runs "nodetool compact" every week to run a full (major) compaction to remove tomb stoned data points.

I noticed that full compaction consumes very little CPU/network resources. With bigger data set, full compaction runs for days.

I have tried to "setcompactionthroughput" to higher number (128MB/s instead of 32MB/s by default, even tried to set it to 0 (no limit), but full compaction speed doesn't seem to change at all.

Is there anything I can tune to make it faster? Thanks in advance.

Jeff Jirsa Jeff Jirsa · Accepted Answer · 2017-10-03T17:16:21

There are very few cases where you should run full compaction via nodetool compact - it causes what you're likely seeing now (a single huge data file, which never naturally compacts with other sstables, even/especially when other deletions have happened).

Recovering from the state your in isn't trivial, but is possible. If you have a lot of cpu/IO to spare, you can try toggling from STCS to LCS, and LeveledCompactionStrategy will naturally split up that huge file into thousands of tiny files, and will be much more aggressive about rewriting those files over time (so tombstones are compacted away much more regularly). This is very much CPU and IO intensive, so don't do it if you're near tipping. Also, it will duplicate all data on disk for a short period, so you'll need to be under 50% disk utilization to do this.

If you're over 50% disk utilization, you've backed yourself into a corner, and you'll probably need to add more disk temporarily in order to recover.

Cassandra 2.1 speed up full compaction

1 Answers