2
votes

I have taken backup of my keyspace in cassandra using this link cassandra-backup.sh

To restore I have written a script which copies the contents from backup folders which contains snapshots and paste in the respective directory under /var/lib/cassandra/data/mykeypsace/, but when I see the tables contents for mykeyspace nothing is restored. Example, backup folder: path/mykeyspace/tableOne/snapshot/all-contents to var/lib/cassandra/data/mykeyspace/tableOne/all-contents

Process I follow to restore:

  1. Drop the keyspace
  2. Restore the schema for mykeyspace (.cql file)
  3. Stop cassandra service
  4. Run my restore script (copy paste thing)
  5. nodetool repair
  6. Start cassandra service

Am I missing something?

Other Details: cqlsh 5.0.1 | Cassandra 3.11.3 | ubuntu 16.04

1
You should start Cassandra when the sstables are in place and only run nodetool refresh (this should work with a live cluster). It is essential that the sstables get restored on the right node where they originally belong.Mandraenke
@Mandraenke i found sstable_activity.... under system folder of cassandra. Should i replace them too and then start cassandra?Amit L

1 Answers

0
votes

Hate to be a killjoy but the backup process isn't as trivial as copying the sstables. You need to keep track of what nodes actually generated a specific SSTable (or the snapshot containing it).

You need to:

Backup

  1. Create a snapshot for every node in the cluster
  2. Create a file that stores what tokens belong to what node (nodetool ring or nodetool info -T)
  3. Backup the schema (you're already doing this)

Recovery:

  1. Create a new cluster with same number of nodes as the cluster for which you wish to restore
  2. For each node in the new cluster, configure initial_token in cassandra.yaml to the tokens you got from Step 2 in the backup stage.(on every node)
    Example of this: initial_token: 1, 2
  3. Make sure that Cassandra's data directory is empty rm /var/lib/cassandra/data (on every node)
  4. Start all the nodes in the cluster
  5. Create the schema that you backed up.
  6. Copy the sstables from the latest snapshot into Cassandra's data directory (on every node)
  7. Run nodetool refresh to make Cassandra load newly copied data. (on every node)

Quick example about the tokens (this can be confusing): Lets say the cluster has 3 nodes, and each node has 2 virtual tokens, so the range is 1-6. Certain tokens are allocated to certain nodes, easiest way to see this is using:

$ nodetool ring

Datacenter: datacenter1
==========
Address        Rack        Status State   Load            Owns                Token                                       
                                                                              6                         
127.0.0.1      rack1       Up     Normal  156.55 MiB      33.33%              1                       
127.0.0.1      rack1       Up     Normal  156.55 MiB      33.33%              2                       
127.0.0.2      rack1       Up     Normal  156.54 MiB      33.33%              3
127.0.0.2      rack1       Up     Normal  156.54 MiB      33.33%              4
127.0.0.3      rack1       Up     Normal  156.55 MiB      33.33%              5
127.0.0.3      rack1       Up     Normal  156.55 MiB      33.33%              6 

I'd have to set the following initial_token for each node when recovering from a snapshot:

# node 127.0.0.1's cassandra.yaml
initial_token: 1,2 

# node 127.0.0.2's cassandra.yaml
initial_token: 3,4 

# node 127.0.0.3's cassandra.yaml 
initial_token: 5,6 

This is normally automated as the default setting for virtual tokens is 256.