3
votes

I've a cluster consisting of nodes in four rings, one in each DC. I am adding a new node to one of the DCs, and it is taking too long. I am using a RF of 3, and there is only one keyspace. I am using cassandra 2.0.11. Few questions:

In 'nodetool netstats', I see the new node is pulling data from nodes in the other data centers too, not just the one it belongs to. Why is that, given that the nodes in its DC have all the data?

Is it required that the cluster must be in a perfect state, not needing any repair, at the time of adding a new node? Could this be the reason why the node is pulling data from nodes in other DCs?

I have set stream throughput to 0, using 'nodetool setthroughput' but I see that the node is receiving data only at about 350kb/s. is there something I can do to make this faster? In the last 1 day I see th e node received only ~2GB of data (as seen in nodetool status), and it still has another 10GB to go, so as you can see it is going to take very long time. Is this normal?

On the node that is getting bootstrapped, 'nodetool netstats' shows it is receiving files from other nodes, but on all other nodes I see 'Not sending any streams'. Is this normal?

Lastly, is it okay to restart a bootstrapping node before it has fully joined the cluster? I want to try changing a few settings which require a restart, but am wondering if the bootstrap streaming will continue from where it left before the restart.

thanks

1

1 Answers

5
votes

I'll try to answer your questions inline, I hope these help:

In 'nodetool netstats', I see the new node is pulling data from nodes in the other data centers too, not just the one it belongs to. Why is that, given that the nodes in its DC have all the data?

This depends on a number of things; the keyspace replication settings, the seeds list (should have at least one node from each DC), and the repair state of the cluster (i.e. keys only in the remote DC).

Is it required that the cluster must be in a perfect state, not needing any repair, at the time of adding a new node? Could this be the reason why the node is pulling data from nodes in other DCs?

No, the cluster state doesn't have to be perfect, but yes you are right, it could be a reason why the node might be streaming from the remote DC

I have set stream throughput to 0, using 'nodetool setthroughput' but I see that the node is receiving data only at about 350kb/s. is there something I can do to make this faster?

Setting this value should unthrottle streaming you are correct. There could be many factors why this is not reaching the full bandwidth, perhaps there are scheduled repairs going on, or other traffic such as client applications reading or writing data to the cluster at the same time? It could also mean that compactions are behind, you can check nodetool tpstats at the time to see what thread pool stats are to see if the node is busy on doing things like compaction.

In the last 1 day I see the node received only ~2GB of data (as seen in nodetool status), and it still has another 10GB to go, so as you can see it is going to take very long time. Is this normal?

Generally speaking no. When I have seen things like slow bootstrapping it can typically be affected by hardware like CPU or Disc throughput. Are your nodes using local discs? Are they SSDs or HDDs? Are you using network attached storeage?

On the node that is getting bootstrapped, 'nodetool netstats' shows it is receiving files from other nodes, but on all other nodes I see 'Not sending any streams'. Is this normal?

Older versions of Cassandra have seen some bootstrap issues where streaming can become "hung", its probably better to try and get onto the latest version as possible for your release stream if possible and re-check.

Lastly, is it okay to restart a bootstrapping node before it has fully joined the cluster? I want to try changing a few settings which require a restart, but am wondering if the bootstrap streaming will continue from where it left before the restart.

Bootstrapping will start from scratch each time you start the process. Restarting a node will stop the original bootstrap process and restart from the start. Note that data files will be re-streamed so you might end up with surplus data on the node. Its best to purge the data directories before you bootstrap again.

If the node doesn't bootstrap you can always set auto_bootstrap: false in your cassandra.yaml file and then run repairs afterwards.