I have a Cassandra db (version 3.11.2) running in AWS, with 2 Datacenters - each in another AWS region and 3 nodes in each one.
The replication factor on all keyspaces is 3, so full replication of data on every node. The size of data is about 10GB per node. All of our writes are in LOCAL_QUORUM against one DC (lets call it DC1). Basically the other DC is just for a kind of backup and disaster recovery, in case the AWS region for DC1 will be unavailable we will redirect traffic to DC2.
My issue is that we had a network disconnection between the two DCs, for several hours, and after several days we noticed that there is missing data in DC2. This all makes sense, since the time the DCs were apart is larger than the Hinted Handoff window (3 hours). So we need to run a repair to bring DC2 back to sync with DC1.
I went over the cassandra docs, and read countless SO answers and for the life of me I couldn't understand what is the right repair to do... Do I need to issue a 'nodetool repair --full --sequential' from only one node? Do I need to run it on every node in the cluster? Maybe it's better to run 'nodetool rebuild'?