8
votes

This question applies to Cassandra 2.2

I am embarrassed to say that I still do not understand when I should be running a nodetool repair, or to be more precise on which nodes.

So far, I understand that to ensure deletes are handled correctly I should be running a repair at a frequency that is less than the GC_GRACE_SECONDS. So that's cool got that bit.

Q. If I have a cluster of 9 nodes with a replication factor of 3, what type of repair do I run? more importantly do I run the repair on every node, or just one node?

Q. If I have multiple data centers, does that change how I run repairs. Do I have to run them in each DC, or can it be coordinated from just one node in one DC?

I am hoping this is a trivial question and someone can just tell it how it is.

2
A question like this is likely to solicit a number of strong opinions from the community members who may favor one approach over the other. A general guide is to avoid such questions and rather ask a more specific question about a problem you are encountering. - ishmaelMakitla

2 Answers

5
votes

The nodetool repair command can be run on either a specified node or on all nodes if a node is not specified. The node that initiates the repair becomes the coordinator node for the operation.

If node it not specified it runs on all the nodes that is responsible for that partition range.

run nodetool repair -pr on every node in the cluster to repair all data. Otherwise, some ranges of data will not be repaired

The nodetool repair -pr option is good for repairs across multiple datacenters.

Note: For Cassandra 2.2 and later, a recommended option for repairs across datacenters: use the -dcpar or --dc-parallel to repair datacenters in parallel.

Nodetool Repair

0
votes

This is the recommendation from datastax.

Run repair frequently enough that every node is repaired before reaching the time specified in the gc_grace_seconds setting. Deleted data is properly handled in the cluster if this requirement is met.