Recently I've faced one problem during bulk load data to a brand new clusters. Basically I start multiple sstableloader on each of my original cassandra node, to stream data to a brand new cassandra cluster with 3 nodes, the original cluster have 3 node as well.
Everything works pretty well initially but finally I found out the new cluster was OOD in the middle. The data on my original cluster is about 15GB per node, and RF==3, it consumed around 200GB spaces on my new cluster and it seems not enough.
I was wondering that did cassandra run compaction during data streaming in? Since I checked the disk and found out one of the table have around 4000+ .db file in it.