0
votes

Recently I've faced one problem during bulk load data to a brand new clusters. Basically I start multiple sstableloader on each of my original cassandra node, to stream data to a brand new cassandra cluster with 3 nodes, the original cluster have 3 node as well.

Everything works pretty well initially but finally I found out the new cluster was OOD in the middle. The data on my original cluster is about 15GB per node, and RF==3, it consumed around 200GB spaces on my new cluster and it seems not enough.

I was wondering that did cassandra run compaction during data streaming in? Since I checked the disk and found out one of the table have around 4000+ .db file in it.

1
What compaction strategy are you using and how you are running sstableloader ? Are you running on each node ? - LetsNoSQL

1 Answers

0
votes

Finally I figured out that cassandra will do compaction during bulk loading process.

The reason why I get OOD is because I launched too many sstableloader on the new cluster to stream data to itself, which put too much pressure on its CPUs, so the compaction speed is way less than the streaming speed which caused OOD finally.