Reaper failed to run repair on Cassandra nodes

Question

After Reaper failed to run repair on 18 nodes of Cassandra cluster, I ran a full repair of each node to fix the failed repair issue, after the full repair, Reaper executed successfully, but after a few days again the Reaper failed to run, I can see the following error in system.log

ERROR [RMI TCP Connection(33673)-10.196.83.241] 2021-09-01 09:01:18,005 RepairRunnable.java:276 - Repair session 81540931-0b20-11ec-a7fa-8d6977dd3c87 for range [(-606604147644314041,-98440495518284645], (-3131564913406859309,-3010160047914391044]] failed with error Terminate session is called
java.io.IOException: Terminate session is called
        at org.apache.cassandra.service.ActiveRepairService.terminateSessions(ActiveRepairService.java:191) ~[apache-cassandra-3.11.0.jar:3.11.0]

INFO  [Native-Transport-Requests-2] 2021-09-01 09:02:52,020 Message.java:619 - Unexpected exception during request; channel = [id: 0x1e99a957, L:/10.196.18.230:9042 ! R:/10.254.252.33:62100]
io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: Connection timed out

in nodetool tpstats I can see some pending tasks

Pool Name                         Active   Pending
ReadStage                              0         0
Repair#18                              3        90
ValidationExecutor                     3         3

Also in nodetool compactionstats there are 4 pending tasks:

-bash-4.2$ nodetool compactionstats
pending tasks: 4
- Main.visit: 1
- Main.post: 1
- Main.stream: 2

My question is why even after a full repair, reaper is still failing? and what is the root cause of pending repair?

PS: version of Reaper is 2.2.3, not sure if it is a bug in Reaper!

Erick Ramirez Erick Ramirez · Accepted Answer · 2021-09-01T14:35:36

There could be a number of things taking place such as Reaper can't connect to the nodes via JMX (for whatever reason). It isn't possible to diagnose the problem with the limited information you've provided.

You'll need to check the Reaper logs for clues on the root cause.

As a side note, this isn't related to repairs and is a client/driver/app connecting to the node on the CQL port:

INFO  [Native-Transport-Requests-2] 2021-09-01 09:02:52,020 Message.java:619 - Unexpected exception during request; channel = [id: 0x1e99a957, L:/10.196.18.230:9042 ! R:/10.254.252.33:62100]
io.netty.channel.unix.Errors$NativeIoException: readAddress() failed: Connection timed out

Cheers!

Reaper failed to run repair on Cassandra nodes

2 Answers