3
votes

Starting from one host running Cassandra, I am trying to add a new node and form a cluster.

I update the seeds list on both hosts and after restarting both nodes, I do nodetool status and see both nodes forming a cluster. However, I am seeing some data loss issue. I am not seeing all the data that I added to a column family before I added the new node.

Steps to reproduce:

  1. Start a node with following settings in cassandra.yaml

    • initial_token:
    • num_tokens:256
    • seed_list: host1
  2. Create a keyspace and a column family and enter some data

  3. Start another node, exact same settings and host1 with the following settings changes on both - seeds: host1, host2
  4. When I log in to cal from host2, I do not see all data.
2
Did you update the replication factor on your keyspace? - Aaron
No, the replication factor is still set to 1. What is a good practice, should I set the replication factor to the number of nodes in my cluster ? - Nitin Bhatt
Updating the replication factor does not help. Still seeing the issue. - Nitin Bhatt
The idea with the replication factor, is that you could lose a machine and still have all of your data. With 2 servers and a replication factor of 1, losing a machine means losing half of your data. So with 2 servers it makes sense to go with a replication factor of 2. But if you went to 3 nodes, staying with a RF of 2 would still allow you to get 100% of your data if you lost a node. - Aaron
I am having this exact same problem. Did you determine a solution? - chrislovecnm

2 Answers

1
votes

Running:

nodetool cleanup
nodetool repair
nodetool rebuild

should solve the issue.

1
votes

Will suggest you to run a nodetool cleanup in both the nodes so that keys get distributed.