Using cassandra version 3.11.4 we imported several days of 'time series like' data in a table created with TimeWindowCompactionStrategy, compaction_window_unit in hours and compaction_window_size of 1:
CREATE TABLE MYTABLE (
some_fields text,
(...)
AND compaction = {
'class' : 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'HOURS',
'compaction_window_size': 1
};
since this is historical data imported from another DB, we changed the timestamp on the insert query in this way:
INSERT INTO MYTABLE (...) USING TIMESTAMP [timestamp of the record] AND TTL ...
where [timestamp of the record] is the timestamp of every time-series record inserted.
Apparently this method worked, as verified enabling TRACE level logging on org.apache.cassandra.db.compaction package:
TRACE [CompactionExecutor:421] ...TimeWindowCompactionStrategy.java:252 - buckets {
1523124000000=[BigTableReader(path='.../md-487-big-Data.db')],
1523070000000=[BigTableReader(path='.../md-477-big-Data.db')],
1523109600000=[BigTableReader(path='.../md-530-big-Data.db')],
1523134800000=[BigTableReader(path='.../md-542-big-Data.db')] },
max timestamp 1523134800000
Where we found several buckets "one hour" big.
The problem came when we run nodetool compact on every cassandra node.
What we expected was to obtain a single sstable for each "one hour bucket". What we got was a single huge sstable (per node), with all rows merged!
Is this the supposed behavior? are we doing something wrong?
-sand it still created one big sstable file in my case. That outcome very much contradicted what the docs say for that option: "Use -s to not create a single big file" - itzg