1
votes

I have below cassandra query ;

Few days ago i have developed application using c# and Single node Cassandra db. While the application in production, power failure occurred and cassandra commitlog got corrupt. Because of it cassandra node not starting, so i have shifted all commitlog files to another directory and started the cassandra node. Recently i noticed the power failure day's data not available in database, I have all commitlog files with corrupted commitlog file name.

Can you please suggest, is there a way to recover data using commitlog files. As well how to avoid commitlog file corruption issue, so that in production data loss can be avoid.

Thank you.

1
The first thing I would do is to detect which commit logs from that day are corrupted and which are not. A simple way is to add the files one by one and see if Cassandra processes them. Int hat way you should have some data back. My assumption is that only the file in use at the moment of outage is corrupted. - Horia
@Horia thank you for suggestion. I have already tried to add healthy commitlog files in commitlog directory, and restarted node but unfortunately it not recovering data. - Vikrant Kumbhar

1 Answers

0
votes

There is no way to restore back the node to the previous state if your commit logs have got corrupted and you have no SSTables with you.
If your commit logs are healthy (meaning it's not corrupted),
then you just need to restart your node .
It will be replayed,as a result will rebuild the memtable(s) and flush generation-1 SSTables on the disk.

What you can ideally do is to forcibly create SSTables.
You can do that under the apache-cassandra/bin directory by

nodetool flush

So if you are wary of losing commit logs .You can rebuild your node to previous states using SSTables so created above using

nodetool.bat refresh [keyspace] [columnfamily].

Alternatively you can also try creating snapshots.

nodetool snapshot

This command will take a snapshot of all keyspaces on the node.You also have the option of creating backups but this one will only keep record of the latest operations.

For more info try reading https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsNodetool_r.html

I suggest you can also try having more nodes and thus increase the replication factor to avoid such scenarios in future.

Hope it helps!