First, I would change the endpoint_snitch
to GossipingPropertyFileSnitch
on one node and restart it. You need to make sure that approach works, first. Typically, you cannot (easily) change the logical datacenter or rack names on a running cluster. Which technically you're not doing that, but SimpleStrategy
may do some things under the hood to abstract datacenter/rack awareness, so it's a good idea to test it.
If it works, make the change and restart the other node, as well. If it doesn't work, you may need to add 6 new nodes (instead of 4) and decommission the existing 2 nodes.
Or should I change the system keyspaces strategy and replication factor too?
Yes, you should set the same keyspace replication definition on the following keyspaces: system_auth
, system_traces
, and system_distributed
.
Consider this situation: If one of your 2 nodes crashes, you won't be able to log in as the users assigned to that node via the system_auth
table. So it is very important to ensure that system_auth
is replicated appropriately.
I wrote a post on this some time ago (updated in 2018): Replication Factor to use for system_auth
Also, I recommend the same approach on system_traces
and system_distributed
, as future node adds/replacements/repairs may fail if valid token ranges for those keyspaces cannot be located. Basically, using the same approach on them prevents potential problems in the future.
Edit 20200527:
Do I need to launch the nodetool cleanup
on old cluster nodes after the snitch and keyspace topology changes? According to docs "yes," but only on old nodes?
You will need to run it on every node, except for the very last one added. The last node is the only node guaranteed to only have data which match its token range assignments.
"Why?" you may ask. Consider the total percentage ownership as the cluster incrementally grows from 2 nodes to 6. If you bump the RF from 1 to 2 (run a repair), and then from 2 to 3 and add the first node, you have a cluster with 3 nodes and 3 replicas. Each node then has 100% data ownership.
That ownership percentage gradually decreases as each node is added, down to 50% when the 6th and final node is added. But even though all nodes will have ownership of 50% of the token ranges:
- The first 3 nodes will still actually have 100% of the data set, accounting for an extra 50% of the data that they should.
- The fourth node will still have an extra 25% (3/4 minus 1/2 or 50%).
- The fifth node will still have an extra 10% (3/5 minus 1/2).
Therefore, the sixth and final node is the only one which will not contain any more data than it is responsible for.