5
votes

I have used Solr for a while, but am new to SolrCloud. I am investigating whether it makes sense in my context to deploy SolrCloud or to have multiple Solr instances (with matching indexed content) sitting behind an ELB.

My deployment will be in AWS on EC2 instances. Our current troubleshooting strategy in AWS is to terminate misbehaving instances and allow them to be automatically recreated by an AutoScaling group (which configures new instances via scripts when they are created). In fact, we do not have access to log on to the instances once they are in production. Everything stored in Solr can be re-indexed, so there is not a concern for data loss.

When trying to understand the SolrCloud infrastructure, however, I had a few questions:

  • Is Zookeeper able to automatically add a new instance if I destroy one of them? Everything I have seen seems to have static IP addresses in the configurations, which would require the configs to be updated (and Zookeeper restarted) if an instance was terminated and replaced.
  • Is there a "master" Zookeeper instance that I should call, or can I call any of them? If I can call any of them, we would likely put an ELB in front of Zookeeper.
  • If we hit heavy usage and allow the AWS AutoScaling group to create additional servers that serve as SolrCloud shards, will SolrCloud gracefully add the instances and terminate them without problems? (This appears to be true, and the whole point of using SolrCloud.)
1

1 Answers

5
votes
  • Is Zookeeper able to automatically add a new instance if I destroy one of them? Everything I have seen seems to have static IP addresses in the configurations, which would require the configs to be updated (and Zookeeper restarted) if an instance was terminated and replaced.

AN: In ZooKeeper, you will just have to mention about other ZooKeepers. This is to make the ZooKeepers aware of other running ZooKeepers. You don't need to change this config unless you plan to increase/decrease the number of ZooKeepers. Even if we have to do, we can do without disturbing the cluster by doing one at time. Also we keep hostname in config so that change in ip will have no impact on this.

  • Is there a "master" Zookeeper instance that I should call, or can I call any of them? If I can call any of them, we would likely put an ELB in front of Zookeeper.

AN: In ZooKeeper, we have a leader and followers. We don't need to bother about them as we don't communicate with ZooKeepers

  • If we hit heavy usage and allow the AWS AutoScaling group to create additional servers that serve as SolrCloud shards, will SolrCloud gracefully add the instances and terminate them without problems? (This appears to be true, and the whole point of using SolrCloud.)

AN: When you create a new SOLR node, you will have to start the node under the same cluster (Pass same ZooKeepers). Once you start with this, you will have to split a shard and move it to another node so as to balance the cluster. Not automated as of now.

SOLR Nodes are the one that you have to add in your ELB.

When you start a SOLR node, you will mention the list of ZooKeepers by which SOLR node will understand which cluster is that part of and other nodes serving the cluster