Using solrj and LBHttpSolrClient to access a single solrcloud instance

Question

Is using the LBHttpSolrClient within solrj to access a single solrcloud instance is it less robust than using the default solrj and zookeeper behavior? Can it load balance over a single solrcloud instance correctly?

The solrcloud instance that I have available has a collection with about 9 million documents, spread over three shards with about 3 million documents per shard. There are three nodes (servers) in the solrcloud, with 3 shards, replicationFactor is 2, and maxShardsPerNode of 2. For this solrcloud instance, there are 3 zookeeper nodes also running on these three servers.

Note: The values listed in the following variable named solrUrls should be prefixed with "http://" instead of "http_url_". I am unable to post more than 2 URLs at this time so I must "encode" them. Sorry.

This is the basic code that I've been told to use:

String zkUrls = "solrd1:2181,solrd2:2181,solrd3:2181";
String solrUrls = {"http_url_solrd1:8983", "http_url_solrd2:8983", "http_url_solrd3:8983"};

LBHttpSolrClient.Builder lbclient = 
    new BHttpSolrClient.Builder().withBaseSolrUrls(solrUrls);
CloudSolrClient solr = new CloudSolrClient.Builder()
    .withLBHttpSolrClientBuilder(lbclient)
    .withZkHost(zkUrls)
    .build();
cloudServer.setDefaultCollection(defaultCollection);

Is this LBHttpSolrClient client able to properly use the provided solrUrls since each node listed in that variable are just nodes within a single solrcloud? Does this load balance client automatically query all the other nodes to ensure the results are complete for the whole collection instead of just the shards that exist on that node?

If the use of the LBHttpSolrClient client is the correct way to access a single solrcloud instance (better than solrj and zookeeper), then is there a better way to let zookeeper provide the base solr urls? I have an impression that the LBHttpSolrClient client predates the whole solrcloud setup and was a way to load balance over multiple standalone instances of solr; if that's the case then would the use of the LBHttpSolrClient client be obsolete compared to solrj and zookeeper?

References:

Is there any loss of functionality if I use load balancer which does not communicate with zookeeper in solrcloud?
- This link appears to have an appropriate title that may provide some insight in to the same questions that I'm asking, but it has no answers.
Loadbalancer and Solrcloud
- This link discusses how solrj and zookeeper works together, but does not address my questions on if the LBHttpSolrClient client is less robust or if it will work correctly on a single instance of a small solrcloud.
SolrCloud load-balancing
- Does not address if solrj and zookeeper is better suited than use of the LBHttpSolrClient client.

Persimmonium Persimmonium · Accepted Answer · 2017-05-17T07:19:42

I think you are overcomplicating things, you can even totally skip the LBHttpSolrClient in your code, and Solrj will create the needed instance behind the scenes.

In short, CloudSolrClient uses LBHttpSolrClient to send request to right Solr instances. If you want to get the most out of your Solrcloud setup, use CloudSolrClient, if you use just a LBHttpSolrClient (without CloudSolrClient), then you will not know a Solr node has gone down for instance (until you get failed requests).

Using solrj and LBHttpSolrClient to access a single solrcloud instance

1 Answers