I created the following table on Cassandra 3.11 for storing metrics using the TimeWindowCompactionStrategy:
CREATE TABLE metrics.my_test (
metric_name text,
metric_week text,
metric_time timestamp,
tags map<text, text>,
value double,
PRIMARY KEY ((metric_name, metric_week), metric_time)
) WITH CLUSTERING ORDER BY (metric_time DESC)
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'MINUTES'}
AND default_time_to_live = 7776000
AND gc_grace_seconds = 60;
Following the blog post on TLP about TWCS, I thought I'd be able to issue a compaction and none of the SSTables in the same bucket (1 minute window) would be compacted together. However, it seems as though this is not true, and everything gets compacted together. Before compaction:
# for f in *Data.db; do ls -l $f && java -jar /root/sstable-tools-3.11.0-alpha11.jar describe $f | grep timestamp; done
-rw-r--r-- 1 cassandra cassandra 1431 Mar 22 17:29 mc-10-big-Data.db
Minimum timestamp: 1521739701309280 (03/22/2018 17:28:21)
Maximum timestamp: 1521739777814859 (03/22/2018 17:29:37)
-rw-r--r-- 1 cassandra cassandra 619 Mar 22 17:30 mc-11-big-Data.db
Minimum timestamp: 1521739787241285 (03/22/2018 17:29:47)
Maximum timestamp: 1521739810545148 (03/22/2018 17:30:10)
-rw-r--r-- 1 cassandra cassandra 654 Mar 22 17:20 mc-1-big-Data.db
Minimum timestamp: 1521739189529560 (03/22/2018 17:19:49)
Maximum timestamp: 1521739216248636 (03/22/2018 17:20:16)
-rw-r--r-- 1 cassandra cassandra 1154 Mar 22 17:21 mc-2-big-Data.db
Minimum timestamp: 1521739217033715 (03/22/2018 17:20:17)
Maximum timestamp: 1521739277579629 (03/22/2018 17:21:17)
-rw-r--r-- 1 cassandra cassandra 855 Mar 22 17:22 mc-3-big-Data.db
Minimum timestamp: 1521739283859916 (03/22/2018 17:21:23)
Maximum timestamp: 1521739326037634 (03/22/2018 17:22:06)
-rw-r--r-- 1 cassandra cassandra 1047 Mar 22 17:23 mc-4-big-Data.db
Minimum timestamp: 1521739327868930 (03/22/2018 17:22:07)
Maximum timestamp: 1521739387131847 (03/22/2018 17:23:07)
-rw-r--r-- 1 cassandra cassandra 1288 Mar 22 17:24 mc-5-big-Data.db
Minimum timestamp: 1521739391318240 (03/22/2018 17:23:11)
Maximum timestamp: 1521739459713561 (03/22/2018 17:24:19)
-rw-r--r-- 1 cassandra cassandra 767 Mar 22 17:25 mc-6-big-Data.db
Minimum timestamp: 1521739461284097 (03/22/2018 17:24:21)
Maximum timestamp: 1521739505132186 (03/22/2018 17:25:05)
-rw-r--r-- 1 cassandra cassandra 1216 Mar 22 17:26 mc-7-big-Data.db
Minimum timestamp: 1521739507504019 (03/22/2018 17:25:07)
Maximum timestamp: 1521739583459167 (03/22/2018 17:26:23)
-rw-r--r-- 1 cassandra cassandra 749 Mar 22 17:27 mc-8-big-Data.db
Minimum timestamp: 1521739587644109 (03/22/2018 17:26:27)
Maximum timestamp: 1521739625351120 (03/22/2018 17:27:05)
-rw-r--r-- 1 cassandra cassandra 1259 Mar 22 17:28 mc-9-big-Data.db
Minimum timestamp: 1521739627983733 (03/22/2018 17:27:07)
Maximum timestamp: 1521739698691870 (03/22/2018 17:28:18)
After issuing nodetool compact metrics my_test:
# for f in *Data.db; do ls -l $f && java -jar /root/sstable-tools-3.11.0-alpha11.jar describe $f | grep timestamp; done
-rw-r--r-- 1 cassandra cassandra 8677 Mar 22 17:30 mc-12-big-Data.db
Minimum timestamp: 1521739189529561 (03/22/2018 17:19:49)
Maximum timestamp: 1521739810545148 (03/22/2018 17:30:10)
It's clear to see that SSTables from multiple time windows were merged together, as the only SSTable after the compaction covers 17:19:49 to 17:30:10.
What can I do to prevent this from happening? I have a large-ish (12 nodes, ~550GB/node) table implemented with TWCS, but has multiple overlapping SSTables. I'd like to compress out any tombstones, and merge those overlapping SSTables; however, I'm worried I'll be left with a single 550GB SSTable per node. My concern is a single SSTable that large will be slow when doing reads... is that a valid concern?