1
votes

Adding a new single token per nodes to existing datastax cluster and data transfer is not working. Process followed is mentioned below. Please update me if the process i followed is wrong.Thanks

We have 3 Single token range datastax nodes in our AWS EC2 Datacenter, both Search and Graph enabled. We are planning to add 3 more nodes into into our datacenter. We are currently using DseSimpleSnitch and Simple network topology for our keyspace.Also our current replication factor is 2.

Node 1 : 10.10.1.36
Node 2 : 10.10.1.46
Node 3 : 10.10.1.56

 cat /etc/default/dse | grep -E 'GRAPH_ENABLED=|SOLR_ENABLED='
   GRAPH_ENABLED=1  
   SOLR_ENABLED=1  

Datacenter : SearchGraph

Address     Rack          Status   State    Load      Owns Token               
10.10.1.46  rack1       Up     Normal  760.14 MiB  ? -9223372036854775808                  
10.10.1.36  rack1       Up     Normal  737.69 MiB  ? -3074457345618258603                   
10.10.1.56  rack1       Up     Normal  752.25 MiB  ? 3074457345618258602                   

Step (1) For adding 3 new node into our datacenter first we changed our keyspace topology and snitch to network aware.

1)Changed the snitch. cat /etc/dse/cassandra/cassandra.yaml | grep endpoint_snitch: endpoint_snitch: GossipingPropertyFileSnitch

cat /etc/dse/cassandra/cassandra-rackdc.properties |grep -E 'dc=|rack='
  dc=SearchGraph
  rack=rack1

2) (a) Shut down all the nodes, then restart them.

(b) Run a sequential repair and nodetool cleanup on each node.

3)Changed keyspace topology.

ALTER KEYSPACE tech_app1 WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};
ALTER KEYSPACE tech_app2 WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};
ALTER KEYSPACE tech_chat WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};

Reference : http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html , http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSwitchSnitch.html

Step (2) For updating token range and setting up new cassandra node, we follow below process.

1) Recalculate token range

root@ip-10-10-1-36:~# token-generator

DC #1:

Node #1:  -9223372036854775808
Node #2:  -6148914691236517206
Node #3:  -3074457345618258604
Node #4:                    -2
Node #5:   3074457345618258600
Node #6:   6148914691236517202

2) Installed Datastax enterprise same version on new nodes.

3) Stopped the node service and and cleared the data.

4) (a) Assigned token range in following manner to new node.

Node 4: 10.10.2.96     Range: -2 
Node 5: 10.10.2.97     Range: 3074457345618258600
Node 6: 10.10.2.86     Range: 6148914691236517202

4) (b) Configured cassandra.yaml on each new node:

Node 4 :

cluster_name: 'SearchGraph' 
num_tokens: 1
initial_token: -2  
parameters: 
- seeds: "10.10.1.46, 10.10.1.56" 
listen_address: 10.10.2.96 
rpc_address: 10.10.2.96 
endpoint_snitch: GossipingPropertyFileSnitch

Node 5 :

cluster_name: 'SearchGraph' 
num_tokens: 1
initial_token: 3074457345618258600  
parameters: 
- seeds: "10.10.1.46, 10.10.1.56" 
listen_address: 10.10.2.97 
rpc_address: 10.10.2.97
endpoint_snitch: GossipingPropertyFileSnitch

Node 6 :

cluster_name: 'SearchGraph' 
num_tokens: 1
initial_token: 6148914691236517202   
parameters: 
- seeds: "10.10.1.46, 10.10.1.56" 
listen_address: 10.10.2.86 
rpc_address: 10.10.2.86 
endpoint_snitch: GossipingPropertyFileSnitch

5) Changed the snitch.

cat /etc/dse/cassandra/cassandra.yaml | grep endpoint_snitch:
endpoint_snitch: GossipingPropertyFileSnitch

cat /etc/dse/cassandra/cassandra-rackdc.properties |grep -E 'dc=|rack='
dc=SearchGraph
rack=rack1

6) Start DataStax Enterprise on each new node in two minutes intervals with consistent.rangemovement turned off:

JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false

7) After the new nodes are fully bootstrapped, used nodetool move to assign the new initial_token for existing nodes as per token recalculation done at step 4(a). Process done on each node one at a time.

On  Node 1(10.10.1.36)  :  nodetool move -3074457345618258603
On  Node 2(10.10.1.46)  :  nodetool move -9223372036854775808
On  Node 3(10.10.1.56)  :  nodetool move  3074457345618258602

Datacenter: SearchGraph

Address     Rack        Status State   Load            Owns                Token

10.10.1.46  rack1       Up     Normal  852.93 MiB ? -9223372036854775808
10.10.1.36  rack1       Up     Moving  900.12 MiB ? -3074457345618258603
10.10.2.96  rack1       UP     Normal  465.02 KiB ? -2
10.10.2.97  rack1       Up     Normal  109.16 MiB ? 3074457345618258600
10.10.1.56  rack1       Up     Moving  594.49 MiB ? 3074457345618258602
10.10.2.86  rack1       Up     Normal  663.94 MiB ? 6148914691236517202

Post Updated:

But we are getting following error while joining nodes.

AbstractSolrSecondaryIndex.java:1884 - Cannot find core chat.chat_history
AbstractSolrSecondaryIndex.java:1884 - Cannot find core chat.history
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.business_units
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.feeds
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.feeds_2
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.knowledegmodule
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.userdetails
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.userdetails_2
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.vault_details
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.workgroup
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.feeds
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.knowledgemodule
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.organizations
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.userdetails
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.vaults
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.workgroup

Node joining failed with following error :

ERROR [main] 2017-08-10 04:22:08,449  DseDaemon.java:488 - Unable to start DSE server.
com.datastax.bdp.plugin.PluginManager$PluginActivationException: Unable to activate plugin com.datastax.bdp.plugin.SolrContainerPlugin


Caused by: java.lang.IllegalStateException: Cannot find secondary index for core ekamsearch.userdetails_2, did you create it? 
If yes, please consider increasing the value of the dse.yaml option load_max_time_per_core, current value in minutes is: 10

ERROR [main] 2017-08-10 04:22:08,450  CassandraDaemon.java:705 - Exception encountered during startup
java.lang.RuntimeException: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Unable to activate plugin

Has anyone encountered these errors or warnings before?

1
Any particular reason while you are manually assigning tokens, while you could set numtoken = 1 in Cassandra.yaml and let Cassandra handle it for you.dilsingi
I had already configured num_tokens: 1 and also initial_token range as per recalculation mentioned in above Step 2 (1). We want to assign initial_token range manually, not Cassandra to handle it because i think current cluster Solr will not work if we change it and rebalanced using Opscenter, please clarify if i am wrong. Is the above steps we followed is correct ? for adding nodes.Sreeraju V
I believe its tedious to manually manage tokens as you scale the cassandra nodes. The num_tokens:1 itself will automatically help manage that at Cassandra level and as the data gets rebalanced to the new node Solr will index them. As the data moves to the new node the corresponding records are removed from old node, as you run nodetool cleanup. As the records die in old nodes so are the corresponding index entries in Solr. From the Solr core you will be able to see the number of records being indexed and you can validate after adding the nodes. I would avoid manual distribution of tokens.dilsingi
So we can start 3 new nodes with num_tokens:1 and what about existing 3 nodes in cluster which has initial_token: already set.ThanksSreeraju V
Easiest way is to decommission them one at a time as it moves the data to the newly joint nodes. You can add them back without initial token with replace_addressdilsingi

1 Answers

0
votes

Token Assign Issue ::

1) I had wrongly assigned token range in Step 4) (a). Assign token which 
   bisect or trisect the value which are generated using  
   "token-generator"
         Node 4: 10.10.2.96     Range: -6148914691236517206 
         Node 5: 10.10.2.97     Range: -2
         Node 6: 10.10.2.86     Range: 6148914691236517202

Note : We don't need to change the token range of existing nodes in data   
       center.No need to follow procedure in Step 7 which i have mentioned 
       above.

Solr Issue resolved : Cannot find cor ::

Increased load_max_time_per_core value in  dse.yaml configuration file, 
still i was receving the error.Finalys solved the issue 
by following method

     1) Started the new nodes as non-solr and wait for all cassandra data  
        to migrate to joining nodes.
     2) Add the parameter auto_bootstrap: False directive to the 
        cassandra.yaml file
     3) Re-start the same nodes after enabling solr. Changed parameter 
        SOLR_ENABLED=1 in /etc/default/dse
     3) Re-index in all new joined nodes. I had to reloaded all core 
        required with the reindex=true and distributed=false parameters in 
        new  joined nodes. 
        Ref : http://docs.datastax.com/en/archived/datastax_enterprise/4.0/datastax_enterprise/srch/srchReldCore.html