0
votes

I just want to understand query flow and how load balancing works in case of LBHttpSolrServer. We have setup SolrCloud with one collection, and that collection has 4 shards and each shard has two nodes i.e one master and one replica.

I have configured LBHttpSolrServer as below.

SolrServer lbHttpSolrServer = new LBHttpSolrServer("shard1_master:8080/solr/","shard2_master:8080/solr/","shard3_master:8080/solr/","shard4_master:8080/solr/","shard1_replica:8080/solr/","shard2_replica:8080/solr/","shard3_replica:8080/solr/","shard4_replica:8080/solr/",);

From my understanding solr and solrj works as below,

  1. LBHttpSolrServer keeps pinging above list of servers and maintains list of live servers.
  2. Every time query arives it picks one server from the list (round-robin fashion)
  3. Sends query to selected server server.
  4. When query arives at solr node it internally distributes query to remaining shards , collects,merges,ranks results and sends response back to the user.

Here my confusion is at point number 4, is my understanding correct? if not please correct. And do i need to pass all 8 nodes to LBHttpSolrServer or just 4 will be sufficient .

1

1 Answers

1
votes

Yes, that is correct. But instead of using LBhttpSolrServer you can use SolrCloudServer which is cloud aware.

CloudSolrServer will automatically load balance requests across the nodes that comprise the collection that is being queried. Newer versions of the client will also route updates directly to the leader of the correct shard, which reduces load on the servers and speeds up indexing.

Internally, CloudSolrServer uses an instance of LBHttpSolrServer, but the list of URLs is dynamically managed, your program doesn't need to worry about it.

http://lucene.apache.org/solr/4_9_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html