Will regularly running nodetool repair on my Cassandra nodes cripple them?
The Planet Cassandra FAQ notes (emphasis added) that
Anti-Entropy Node Repair – For data that is not read frequently, or to update data on a node that has been down for an extended period, the node repair process (also referred to as anti-entropy repair) ensures that all data on a replica is made consistent. Node repair (using the nodetool utility) should be run routinely as part of regular cluster maintenance operations.
That is the only reference I've seen to be running nodetool repair regularly. Running it regularly won't be a problem if it is cheap, but just how expensive is it? Does it do the equivalent of consistency-checked read of every record on the node? Or is it more clever than that? The documentation mentions the use of Merkle trees, but that does not give me any idea how expensive the operation is.
If you have 500 GB of data on a node, and that node is actually consistent with other nodes (the repair is a no-op), about how much data does the repair read from the disk (reading all 500 GB would take a couple of hours)? And about how much data is sent over the LAN (sending all 500 GB over the LAN could take another hour or so).