I plan to have a multi data center Cassandra 2 setup with 2-4 nodes per data center and several 10s of data centers. We have keyspaces replicated on a certain number of nodes on each data center. We have a vnode based deployment. So tokens should get assigned to the nodes automatically.
Documentation at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html suggests that addition of new node requires cleanup to be run on all other nodes of the cluster. However, it does not clarify the procedure in a multi-data center setup.
My understanding is that nodetool cleanup removes data which no longer belongs to that node. When a new data center is being setup, we are creating completely new replicas and AFAICT, it does not result in data movement/rebalance outside of this new data center and hence there is no cleanup requirement on nodes of other data centers. Is someone able to confirm if my understanding is right, and cleanup is not required on nodes of other data centers?