9
votes

I've been working on a system where they use SolrCloud, which entails a Zookeeper ensemble that helps "manage the overall structure so that both indexing and search requests can be routed properly" (straight out of the Solr documentation).

What exactly is this "management"? What information, what data/configuration/information do the machines running Solr read/write from the Zookeeper ensemble and why? Is the data in Zookeeper ever changed at runtime by solr? Or do you configure "the data" once and runtime is going to be reads all across SolrCloud hosts?

To put the question into perspective, this is my first contact with Zookeeper, Solr, and in many ways with distributed systems.

1

1 Answers

13
votes

A Single node Solr instance uses it's own configuration files usually in a conf folder containing files like schema.xml, stopwords.txt etc. But in Solr cloud context a collection is a logical index having group of cores. These group of cores need centralised configurations (same configuration shared among cores belonging to same collection). ZooKeeper is a centralised service for maintaining configuration information in a distributed system.

You can upload, download, and edit configuration files, so that all cores belonging to the same collection get same config set.

You can read more about Solr cloud config management here