We've been having troubles adding nodes to a Cassandra cluster, and my team is wondering if anyone can spot anything we may be doing wrong, or share their best practices. These nodes were previously members of the cluster, but were removed, and now we are trying to add them back again. The nodes seem to have been removed appropriately: nodetool removenode completed successfully, and nodetool status on the existing nodes showed only the expected nodes, and none of the new nodes. Cluster was fine for a week or two before attempting to add the nodes again.
Before adding these nodes to the cluster again, all the data directories were wiped clean. Then we added the first new node, with autobootstrap, and it appeared to join the cluster just fine, moving past the joining state to normal. So far so good... "describe cluster" seemed to show everything was consistent, and we were about to do a cleanup on the existing nodes and then repair, before adding more nodes.
But then things started to go wrong. Somehow this new node seemed to know about the other previously removed nodes, showing them as down and unreachable (which was true, but how did it now about them?). Read requests to the new node were frequently failing, mostly because data was not found. To avert disaster, the new node was decomissioned and eventually assassinated. And my cluster (and applications) were happy again.
Are there any pre- or post- steps for adding (or removing nodes) we may have missed? I've followed the advice of many other questions posted regarding adding and removing Cassandra nodes. Should I do a complete osreload on the machine before I try adding it as a node again?
I'm grateful for your advice. Again, many of the other questions here has helped me a great deal, but not quite this particular scenario.
cassandra.yaml
, especially the snitch... - Alex Ott