0
votes

I have noted following statement in the Cassandra documentation on commit log archive configuration: https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configLogArchive.html

"Restore stops when the first client-supplied timestamp is greater than the restore point timestamp. Because the order in which the database receives mutations does not strictly follow the timestamp order, this can leave some mutations unrecovered."

This statement made us concerned about using point in time recovery based on Cassandra commit logs, since this indicates a point in time recovery will not recover all mutations with timestamp lower than the indicated restore point timestamp if we have mutations out of timestamp order (which we will have).

I tried to verify this behavior via some experiments but have not been able to reproduce this behavior.

I did 2 experiments:

Simple row inserts

Set restore_point_in_time to 1 hour ahead in time. insert 10 rows (using default current timestamp) insert a row using timestamp <2 hours ahead in time> insert 10 rows (using default current timestamp)

Now I killed my cassandra instance making sure it was terminated without having a chance to flush to SS tables.

During startup I could see from cassandra logs that it was doing CommitLog replay.

After replay I queried by table and could see that 20 rows had been recovered but the one with the timestamp ahead of time was not inserted. Though here based on the documentation I would have expected that only the first 10 rows had been inserted. I verified in casssandra log that CommitLog replay had been done.

Larger CommitLog split experiment

I wanted to see if the documented feature then was working over a commitlog split/rollover.

So I set commitlog_segment_size_in_mb to 1 MB to cause the commitlog to rollover more frequently instead of the 32MB default. I then ran a script to mass insert rows to force the commit log to split.

So the results here was that I inserted 12000 records, then inserted a record with a timestamp ahead of my restore_point_in_time then I inserted 8000 records afterwards.

At about 13200 rows my commitlog rolled over to a new file. I then again killed my cassandra instance and restarted. Again I could see in the log that CommitLog replay was being done and after replay I could see that all rows except the single row with timestamp ahead of restore_point_in_time was recovered.

Notes

I did similar experiments using commitlog_sync batch option and also to make sure my rows had not been flushed to SSTables I tried restoring snapshot with empty tables before starting up cassandra to make it perform commitlog replay. In all cases I got the same results.

I guess my question is if the statement in the documentation is still valid? or maybe I'm missing something in my experiments?

Any help would be greatly appreciated ? I need an answer for this to be able to conclude on a backup/recovery mechanism we want to implement in a larger scale cassandra cluster setup.

All experiments where done using Cassandra 3.11 (single-node-setup) in a Docker container (the official cassandra docker image). I ran the experiments on the image "from-scratch" so no changes in configs where done other than what I included in the description here.

1

1 Answers

0
votes

I think that it will be relatively hard to reproduce, as you'll need to make sure that some of the mutations come later than other, and this may happen mostly when some clients has not synchronized clocks, or nodes are overloaded, and then hints are replayed some time later, etc.

But this parameter may not be required at all - if you look into CommitLogArchiver.java, then you can see that if this parameter is not specified, then it's set to the Long.MAX, meaning that there is no upper bound and all commit logs will be replayed, and then Cassandra will handle it standard way: "the latest timestamp wins".