1
votes

I start with cassandra and I need to set up point in time recovery for it. I was actived commitlog but only archived commit log appear in my backup folder, the current commit log is updated live. So, if the node crash, when I restore from archived commitlog, how can i get the last log that are not in the archived log?

When i use nodetool flush, incremental backup is updated but commitlog isn't archived

1

1 Answers

2
votes

When a write comes to a Cassandra node it first goes to the commit log(disk) then to memtable(memory).

Periodically and depending on some conditions( size ...) the memtables are flushed to disk and they become SSTables.

The commit log is used to replay the data that was in memory and not flushed to disk if a node crashes. So when the data in memory is flushed to disk the corresponding commit log is purged.

so if you use a nodetool flush, the data in memtables is flushed to disk to sstable, and no need for the commit log anymore.

if the node crashes, you don't have to do any restore, when it restarts, it will replay the mutations contained in the commit log : the commit log will not be empty if any data was not flushed to disk.

Restore and backup are more handled via snapshots, nodetool snapshot, using commit log is not common for restoring from a save point, more appropriate use is when a node crashes and data was not written to disk.

You can also activate archiving commit logs if you want to :

https://cassandra.apache.org/doc/latest/configuration/cass_cl_archive_file.html

You can find more details about Cassandra backups here : https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsBackupRestore.html

And if you want take a look at the write path in Cassandra, it will give you better understanding about how data is written :

https://docs.datastax.com/en/cassandra-oss/2.1/cassandra/dml/dml_write_path_c.html