0
votes

After starting the repair service, it shows a percentage illustrating the current repair process going. When the whole cluster is repaired, it goes OFF again.

I thought it was repairing the whole cluster smoothly, forever, starting again and again, but it appears to "finish"... which is not my expectation

Did I miss something?

  • OpsCenter 5.2.0
  • DSE 4.6.7

Edit:

Logs:

2015-09-02 08:33:34+0000 [XX]  INFO: Detected a topology change. The Repair Service will stop now and check the cluster topology every 5 minutes. If the cluster is stable, the Repair Service will start again.
2015-09-02 08:33:34+0000 [XX]  INFO: Stopping Repair Service
2015-09-02 08:48:34+0000 []  INFO: Unhandled error in Deferred:
2015-09-02 08:48:34+0000 [] Unhandled Error
    Traceback (most recent call last):
      File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 361, in callback
        self._startRunCallbacks(result)
      File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 455, in _startRunCallbacks
        self._runCallbacks()
      File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 542, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 1076, in gotResult
        _inlineCallbacks(r, g, deferred)
    --- <exception caught here> ---
      File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 1020, in _inlineCallbacks
        result = g.send(result)
      File "/usr/lib/python2.7/dist-packages/opscenterd/cluster/Repair.py", line 909, in startRepairService

    opscenterd.cluster.Repair.RepairServiceAlreadyRunning: The Repair Service is already running.

It seems that OpsCenter is failing at starting again the repair service after a topology change (Adding a node)

1
Check your opscenterd.logphact
The first step in troubleshooting any problem with cassandra is to check the log files. As @phact said you should start by checking the logs.sam
Can you reproduce this all the time? Can you try with logging level set to DEBUG in opscenterd.conf. And include larger pieces of the log.Peter Halliday

1 Answers

-2
votes

You are not experiencing the expected behavior of the repair service as detailed in the documentation:

http://docs.datastax.com/en/opscenter/5.2//opsc/online_help/services/repairService.html

I did a test of the repair service with opscenter 5.2.0 and DSE 4.7.3, and it did behave appropriately. After completing the repair service, it started a new one promptly. This was seen in opscenter on the services screen (not visible in activities).

As stated in the comments, you should review the logs and see what "bread crumbs" you can find.