3
votes

I have setup SolrCloud with 4 shards. I added 8 nodes to the SolrCloud(4 Leaders and 4 Replicas). Each node is running in different machine. But later I identified that my data is growing more and more(daily 4 million files) so that my 4 shards are not sufficient. So, I want to add one more shard to this SolrCloud dynamically. When I add a new node that is created as replica,that is not what I want. When I search for this in Google, the answers I got is use Collection API SPLITSHARD. If I use SPLITSHARD that will split the already existed shard. But here my requirement is to add new shard to this SolrCloud. How to do this?

Any suggestion will be appreciated. Thanks in advance.

2

2 Answers

2
votes

The answer is buried in the SolrCloud docs. See https://cwiki.apache.org/confluence/display/solr/Nodes,+Cores,+Clusters+and+Leaders the section 'Resizing a Cluster'

Basically the process is

  1. Split a Shard - now you will have two shards on that one machine
  2. Setup a replica of this new shard on your new machine
  3. Remove the new shard from the original machine. ZooKeeper will promote the replica to the leader for that shard.
  4. Setup a replica for that new shard

Very kludgy and manual process. SolrCloud isn't very "Cloudy" i.e. elastic.

0
votes

When you create the collection at the first time you make a very important decision, which is the sharding technique. Solr provides two different ways, implicit, or compositeId.

if you set it to compositeId, this means you want solr to calculate the shard based on a field of your choice (or the id by default), Solr will calculate a 32-bit integer hash key based on that field, and allocate a range for each shard. You also need to specify the number of shards in advance. So, solr will allocate a range of the 32-bit integer values for each shard, and according to the hash value it will route the document to the proper shard. For example if you set it to 4 shards, and the hash key happens to be in the first quarter of the 32-bit range, then it goes to first shard, and so on...

With this way you cannot change the number of shards later on. Because that will break the whole structure, you can still split one range into two separate sub-ranges. But you cannot just extend existing structure.

Second way, which is implicit, you don't have to specify the number of shards in advance, but you do the sharding manually in your application, and provide a field that has the name of the shard so, solr can route the document directly without calculating any thing. In this way, you can create as many shards in the future without affecting existing shards, you will simply create a new shard by name, and your application will start populating future documents with the new name.

So, in your situation, if you already chose compositeId, you cannot add shards, you can only split existing ones. If you think your shards will change much in the future, I'd suggest you re-build your cloud using implicit sharding.

check out Solr collection Api for more details : https://cwiki.apache.org/confluence/display/solr/Collections+API