I am a little unclear about the following lines from the Datastax page on tuning cassandra compactions. Specifically they mention:
"Administrators can also initiate a major compaction through nodetool compact, which merges all SSTables into one. Though major compaction can free disk space used by accumulated SSTables, during runtime it temporarily doubles disk space usage and is I/O and CPU intensive. Also, once you run a major compaction, automatic minor compactions are no longer triggered frequently forcing you to manually run major compactions on a routine basis. So while read performance will be good immediately following a major compaction, it will continually degrade until the next major compaction is manually invoked. For this reason, major compaction is NOT recommended by DataStax." (http://www.datastax.com/docs/1.0/operations/tuning)
The two questions after reading this that came to my mind that I am trying to understand better are:
- Why would the major compaction that is triggered manually change the minor compaction interval / frequency? I am not quite sure I follow the underlying reason behind this.
- If I do need to run major compaction manually using nodetool, is it even possible and if so how can I revert back to ensure that the minor compaction intervals do not get affected as a result and are reset to the default behavior.
Thanks.