Hate to be a killjoy but the backup process isn't as trivial as copying the sstables. You need to keep track of what nodes actually generated a specific SSTable (or the snapshot containing it).
You need to:
- Create a snapshot for every node in the cluster
- Create a file that stores what tokens belong to what node (
nodetool ring
or nodetool info -T
- Backup the schema (you're already doing this)
- Create a new cluster with same number of nodes as the cluster for which you wish to restore
- For each node in the new cluster, configure initial_token in cassandra.yaml to the tokens you got from Step 2 in the backup stage.(on every node)
Example of this: initial_token: 1, 2
- Make sure that Cassandra's data directory is empty
rm /var/lib/cassandra/data
(on every node)
- Start all the nodes in the cluster
- Create the schema that you backed up.
- Copy the sstables from the latest snapshot into Cassandra's data directory (on every node)
- Run
nodetool refresh
to make Cassandra load newly copied data. (on every node)
Quick example about the tokens (this can be confusing):
Lets say the cluster has 3 nodes, and each node has 2 virtual tokens, so the range is 1-6. Certain tokens are allocated to certain nodes, easiest way to see this is using:
$ nodetool ring
Datacenter: datacenter1
Address Rack Status State Load Owns Token
6 rack1 Up Normal 156.55 MiB 33.33% 1 rack1 Up Normal 156.55 MiB 33.33% 2 rack1 Up Normal 156.54 MiB 33.33% 3 rack1 Up Normal 156.54 MiB 33.33% 4 rack1 Up Normal 156.55 MiB 33.33% 5 rack1 Up Normal 156.55 MiB 33.33% 6
I'd have to set the following initial_token for each node when recovering from a snapshot:
# node's cassandra.yaml
initial_token: 1,2
# node's cassandra.yaml
initial_token: 3,4
# node's cassandra.yaml
initial_token: 5,6
This is normally automated as the default setting for virtual tokens is 256.
nodetool refresh
(this should work with a live cluster). It is essential that the sstables get restored on the right node where they originally belong. – Mandraenke