1
votes

I plan to have a multi data center Cassandra 2 setup with 2-4 nodes per data center and several 10s of data centers. We have keyspaces replicated on a certain number of nodes on each data center. We have a vnode based deployment. So tokens should get assigned to the nodes automatically.

Documentation at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html suggests that addition of new node requires cleanup to be run on all other nodes of the cluster. However, it does not clarify the procedure in a multi-data center setup.

My understanding is that nodetool cleanup removes data which no longer belongs to that node. When a new data center is being setup, we are creating completely new replicas and AFAICT, it does not result in data movement/rebalance outside of this new data center and hence there is no cleanup requirement on nodes of other data centers. Is someone able to confirm if my understanding is right, and cleanup is not required on nodes of other data centers?

1

1 Answers

2
votes

Your understanding is right, but the answer to your question varies depending on the Replication strategy you setup when creating a keyspace. If you are using NetworkTopologyStrategy on all your keyspaces the multiple datacenters will behave as you understand it and explained, and a cleanup after adding a new datacenter is not needed. But if you are using SimpleStrategy on any keyspace all the datacenters will work as a single cluster for that keyspace, thus a cleanup is needed after adding a new node/nodes.

You can check the current replication strategy using this command on cqlsh:

describe KEYSPACE keyspacename

Hope it helps!