I am looking for confirmation that my Cassandra backup and restore procedures are sound and I am not missing anything. Can you please confirm, or tell me if something is incorrect/missing?
Backups:
- I run daily full backups of the keyspaces I care about, via "nodetool snapshot keyspace_name -t current_timestamp". After the snapshot has been taken, I copy the data to a mounted disk, dedicated to backups, then do a "nodetool clearsnapshot $keyspace_name -t $current_timestamp"
- I also run hourly incremental backups - executing a "nodetool flush keyspace_name" and then moving files from the backup directory of each keyspace, into the backup mountpoint
Restore:
So far, the only valid way I have found to do a restore (and tested/confirmed) is to do this, on ALL Cassandra nodes in the cluster:
- Stop Cassandra
- Clear the commitlog *.log files
- Clear the *.db files from the table I want to restore
- Copy the snapshot/full backup files into that directory
- Copy any incremental files I need to (I have not tested with multiple incrementals, but I am assuming I will have to overlay the files, in sequence from oldest to newest)
- Start Cassandra
- On one of the nodes, run a "nodetool repair keyspace_name"
So my questions are:
- Does the above backup and restore strategy seem valid? Are any steps inaccurate or anything missing?
- Is there a way to do this without stopping Cassandra on EVERY node? For example, is there a way to restore the data on ONE node, then somehow make it "authoritative"? I tried this, and, as expected, since the restored data is older, the data on the other nodes (which is newer) overwrites in when they sync up during repair.
Thank you!