19
votes

I'm building a backup and restore process for a Cassandra database so that it's ready when I need it, and so that I understand the details in order to build something that will work for production. I'm following Datastax's instructions here:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html.

As a start, I'm seeding the database on a dev box then attempting to make the backup/restore work. Here's the backup script:

#!/bin/bash

cd /opt/apache-cassandra-2.0.9
./bin/nodetool clearsnapshot -t after_seeding makeyourcase
./bin/nodetool snapshot -t after_seeding makeyourcase

cd /var/lib/
tar czf after_seeding.tgz cassandra/data/makeyourcase/*/snapshots/after_seeding

Yes, tar is not the most efficient way, perhaps, but I'm just trying to get something working right now. I've checked the tar, and all the files are there.

Once the database is backed up, I shut down Cassandra and my app, then rm -rf /var/lib/cassandra/ to simulate a complete loss.

Now to restore the database. Restoration "Method 2" from http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html is more compatible with my schema-creation component than Method 1.

So, Method 2/Step 1, "Recreate the schema": Restart Cassandra, then my app. The app is built to re-recreate the schema on startup when necessary. Once it's up, there's a working Cassandra node with a schema for the app, but no data.

Method 2/Step 2 "Restore the snapshot": They give three alternatives, the first of which is to use sstableloader, documented at http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html. The folder structure that the loader requires is nothing like the folder structure created by the snapshot tool, so everything has to be moved into place. Before going to all that trouble, I'll just try it out on one table:

>./bin/sstableloader makeyourcase/users
Error: Could not find or load main class org.apache.cassandra.tools.BulkLoader

Hmmm, well, that's not going to work. BulkLoader is in ./lib/apache-cassandra-2.0.9.jar, but the loader doesn't seem to be set up to work out of the box. Rather than debug the tool, let's move on to the second alternative, copying the snapshot directory into the makeyourcase/users/snapshots/ directory. This should be easy, since we're throwing the snapshot directory right back where it came from, so tar xzf after_seeding.tgz should do the trick:

cd /var/lib/
tar xzf after_seeding.tgz
chmod -R u+rwx cassandra/data/makeyourcase

and that puts the snapshot directories back under their respective 'snapshots' directories, and a refresh should restore the data:

cd /opt/apache-cassandra-2.0.9
./bin/nodetool refresh -- makeyourcase users

This runs without complaint. Note that you have to run this for each and every table, so you have to generate the list of tables first. But, before we do that, note that there's something interesting in the Cassandra logs:

INFO 14:32:26,319 Loading new SSTables for makeyourcase/users...
INFO 14:32:26,326 No new SSTables were found for makeyourcase/users

So, we put the snapshot back, but Cassandra didn't find it. I also tried moving the snapshot directory under the existing SSTables directory, and copying the old SSTable files into the existing directory, with the same error in the log. Cassandra doesn't log where it expects to find them, just that it can't find them. The docs say to put them into a directory named data/keyspace/table_name-UUID, but there is no such directory. There is one named data/makeyourcase/users/snapshots/1408820504987-users/, but putting the snapshot dir there, or the individual files, didn't work.

The third alternative, the "Node restart method" doesn't look suitable for a multi-node production environment, so I didn't try that.

Edit:

Just to make this perfectly explicit for the next person, here are the preliminary, working backup and restore scripts that apply the accepted answer.

myc_backup.sh:

#!/bin/bash

cd ~/bootstrap/apache-cassandra-2.0.9
./bin/nodetool clearsnapshot -t after_seeding makeyourcase
./bin/nodetool snapshot -t after_seeding makeyourcase

cd /var/lib/
tar czf after_seeding.tgz cassandra/data/makeyourcase/*/snapshots/after_seeding

myc_restore.sh:

#!/bin/bash

cd /var/lib/
tar xzf after_seeding.tgz
chmod -R u+rwx cassandra/data/makeyourcase

cd ~/bootstrap/apache-cassandra-2.0.9
TABLE_LIST=`./bin/nodetool cfstats makeyourcase | grep "Table: " | sed -e 's+^.*: ++'`
for TABLE in $TABLE_LIST; do
    echo "Restore table ${TABLE}"
    cd /var/lib/cassandra/data/makeyourcase/${TABLE}
    if [ -d "snapshots/after_seeding" ]; then
        cp snapshots/after_seeding/* .
        cd ~/bootstrap/apache-cassandra-2.0.9
        ./bin/nodetool refresh -- makeyourcase ${TABLE}
        cd /var/lib/cassandra/data/makeyourcase/${TABLE}
        rm -rf snapshots/after_seeding
        echo "    Table ${TABLE} restored."
    else
        echo "    >>> Nothing to restore."
    fi
done
3

3 Answers

9
votes

Added more details:

You can run the snapshot for your particular keyspace using:

$ nodetool snapshot <mykeyspace> -t <SnapshotDirectoryName>

This will create the snapshot files inside the snapshots directory in data.

When you delete your data, make sure you don't delete the snapshots folder or you will not be able to restore it (unless you are moving it to another location / machine.)

$ pwd
/var/lib/cassandra/data/mykeyspace/mytable
$ ls
mykeyspace-mytable-jb-2-CompressionInfo.db mykeyspace-mytable-jb-2-Statistics.db
mykeyspace-mytable-jb-2-Data.db mykeyspace-mytable-jb-2-Filter.db mykeyspace-mytable-jb-2-Index.db
mykeyspace-mytable-jb-2-Summary.db mykeyspace-mytable-jb-2-TOC.txt snapshots


$ rm *
rm: cannot remove `snapshots': Is a directory

Once you are ready to restore, copy back the snapshot data into the keyspace/table directory (one for each table):

$ pwd
/var/lib/cassandra/data/mykeyspace/mytable
$ sudo cp snapshots/<SnapshotDirectoryName>/* .

You mentioned:

and that puts the snapshot directories back under their respective 'snapshots' directories, and a refresh >should restore the data:

I think the issue is that you are restoring the Snapshot data into the snapshot directory. It should go right in the table directory. Everything else seems right, let me know.

3
votes

The docs say to put them into a directory named data/keyspace/table_name-UUID, but there is no such directory.

You don't have this UUID directory because you are using cassandra 2.0 and this UUID thing started with cassandra 2.2

0
votes

Step-1: I created one table by using the below command

CREATE TABLE Cricket (
    PlayerID uuid,
    LastName varchar,
    FirstName varchar,  
    City varchar,
    State varchar,
    PRIMARY KEY (PlayerID));

Step-2: Insert 3 records by using below command

INSERT INTO Cricket (PlayerID, LastName, FirstName, City, State)
VALUES (now(), 'Pendulkar', 'Sachin', 'Mumbai','Maharastra');
INSERT INTO Cricket (PlayerID, LastName, FirstName, City, State)
VALUES (now(), 'Vholi', 'Virat', 'Delhi','New Delhi');
INSERT INTO Cricket (PlayerID, LastName, FirstName, City, State)
VALUES (now(), 'Sharma', 'Rohit', 'Berhampur','Odisha');

Step-3: Accidentally I deleted Cricket table

drop table Cricket;

Step-4: Need to recover that table by using auto snapshotbackup Note: auto_snapshot (Default: true ) Enable or disable whether a snapshot is taken of the data before keyspace truncation or dropping of tables. To prevent data loss, using the default setting is strongly advised.

Step-5: Find the snapshot locations and files

cassandra@node1:~/data/students_details$ cd cricket-88128dc0960d11ea947b39646348bb4f
cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f$ ls -lrth
total 0
drwxrwxr-x 2 cassandra cassandra  6 May 14 18:05 backups
drwxrwxr-x 3 cassandra cassandra 43 May 14 18:06 snapshots

Step-6: You will get one .cql file in that snapshot location which having tables DDL.

cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f/snapshots/dropped-1589479603749-cricket$ ls -lrth
total 44K
-rw-rw-r-- 1 cassandra cassandra   92 May 14 18:06 md-1-big-Summary.db
-rw-rw-r-- 1 cassandra cassandra   61 May 14 18:06 md-1-big-Index.db
-rw-rw-r-- 1 cassandra cassandra   16 May 14 18:06 md-1-big-Filter.db
-rw-rw-r-- 1 cassandra cassandra  179 May 14 18:06 md-1-big-Data.db
-rw-rw-r-- 1 cassandra cassandra   92 May 14 18:06 md-1-big-TOC.txt
-rw-rw-r-- 1 cassandra cassandra 4.7K May 14 18:06 md-1-big-Statistics.db
-rw-rw-r-- 1 cassandra cassandra    9 May 14 18:06 md-1-big-Digest.crc32
-rw-rw-r-- 1 cassandra cassandra   43 May 14 18:06 md-1-big-CompressionInfo.db
-rw-rw-r-- 1 cassandra cassandra  891 May 14 18:06 schema.cql
-rw-rw-r-- 1 cassandra cassandra   31 May 14 18:06 manifest.json
cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f/snapshots/dropped-1589479603749-cricket$

more schema.cql
cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f/snapshots/dropped-1589479603749-cricket$ more schema.cql
CREATE TABLE IF NOT EXISTS students_details.cricket (
        playerid uuid PRIMARY KEY,
        city text,
        firstname text,
        lastname text,
        state text)
        WITH ID = 88128dc0-960d-11ea-947b-39646348bb4f
        AND bloom_filter_fp_chance = 0.01
        AND dclocal_read_repair_chance = 0.1
        AND crc_check_chance = 1.0
        AND default_time_to_live = 0
        AND gc_grace_seconds = 864000
        AND min_index_interval = 128
        AND max_index_interval = 2048
        AND memtable_flush_period_in_ms = 0
        AND read_repair_chance = 0.0
        AND speculative_retry = '99PERCENTILE'
        AND comment = ''
        AND caching = { 'keys': 'ALL', 'rows_per_partition': 'NONE' }
        AND compaction = { 'max_threshold': '32', 'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
        AND compression = { 'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor' }
        AND cdc = false
        AND extensions = {  };

Step-7: Login to the database and create table using that DDL.

    apiadmin@cqlsh:coopersdev> use students_details;
    apiadmin@cqlsh:students_details> CREATE TABLE IF NOT EXISTS students_details.cricket (
                        ...         playerid uuid PRIMARY KEY,
                        ...         city text,
                        ...         firstname text,
                        ...         lastname text,
                        ...         state text)
                        ...         WITH ID = 88128dc0-960d-11ea-947b-39646348bb4f
                        ...         AND bloom_filter_fp_chance = 0.01
                        ...         AND dclocal_read_repair_chance = 0.1
                        ...         AND crc_check_chance = 1.0
                        ...         AND default_time_to_live = 0
                        ...         AND gc_grace_seconds = 864000
                        ...         AND min_index_interval = 128
                        ...         AND max_index_interval = 2048
                        ...         AND memtable_flush_period_in_ms = 0
                        ...         AND read_repair_chance = 0.0
                        ...         AND speculative_retry = '99PERCENTILE'
                        ...         AND comment = ''
                        ...         AND caching = { 'keys': 'ALL', 'rows_per_partition': 'NONE' }
                        ...         AND compaction = { 'max_threshold': '32', 'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
                        ...         AND compression = { 'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor' }
                        ...         AND cdc = false
                        ...         AND extensions = {  };
apiadmin@cqlsh:students_details>

Step-8: copy all the files on snapshot folder to existing cricket table folder

cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f/snapshots/dropped-1589479603749-cricket$ pwd
/home/cassandra/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f/snapshots/dropped-1589479603749-cricket
cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f/snapshots/dropped-1589479603749-cricket$ cp * /home/cassandra/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f
cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f/snapshots/dropped-1589479603749-cricket$ cd /home/cassandra/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f
cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f$ ls -lrth
total 44K
drwxrwxr-x 2 cassandra cassandra    6 May 14 18:05 backups
drwxrwxr-x 3 cassandra cassandra   43 May 14 18:06 snapshots
-rw-rw-r-- 1 cassandra cassandra  891 May 14 18:11 schema.cql
-rw-rw-r-- 1 cassandra cassandra   92 May 14 18:11 md-1-big-TOC.txt
-rw-rw-r-- 1 cassandra cassandra   92 May 14 18:11 md-1-big-Summary.db
-rw-rw-r-- 1 cassandra cassandra 4.7K May 14 18:11 md-1-big-Statistics.db
-rw-rw-r-- 1 cassandra cassandra   61 May 14 18:11 md-1-big-Index.db
-rw-rw-r-- 1 cassandra cassandra   16 May 14 18:11 md-1-big-Filter.db
-rw-rw-r-- 1 cassandra cassandra    9 May 14 18:11 md-1-big-Digest.crc32
-rw-rw-r-- 1 cassandra cassandra  179 May 14 18:11 md-1-big-Data.db
-rw-rw-r-- 1 cassandra cassandra   43 May 14 18:11 md-1-big-CompressionInfo.db
-rw-rw-r-- 1 cassandra cassandra   31 May 14 18:11 manifest.json
cassandra@node1:~/data/students_details/cricket-88128dc0960d11ea947b39646348bb4f$

Step-9: start restore table data using sstableloader by using below command

cassandra@node1:~$ sstableloader -d 10.213.61.21 -username cassandra --password cassandra /home/cassandra/data/students_details/cricket-d3576f60960f11ea947b39646348bb4f/snapshots
    Established connection to initial hosts
    Opening sstables and calculating sections to stream

    Summary statistics:
       Connections per host    : 1
       Total files transferred : 0
       Total bytes transferred : 0.000KiB
       Total duration          : 2920 ms
       Average transfer rate   : 0.000KiB/s
       Peak transfer rate      : 0.000KiB/s

Step-10: Table restored successfully.Please verify.

 playerid                             | city      | firstname | lastname  | state
--------------------------------------+-----------+-----------+-----------+------------
 d7b12c90-960f-11ea-947b-39646348bb4f | Berhampur |     Rohit |    Sharma |     Odisha
 d7594890-960f-11ea-947b-39646348bb4f |     Delhi |     Virat |     Vholi |  New Delhi
 d7588540-960f-11ea-947b-39646348bb4f |    Mumbai |    Sachin | Pendulkar | Maharastra