2
votes

I am planning on implementing Solr for our clients. We have a .Net/SQL Server based product. Our DBs have grown so big and we have decided to use Solr to improve the query performance (auto complete, pick lists, grid search etc). We have massive IIS boxes (varies by clients, typical config include 16 core, 96 GB RAM etc.), but slow network. Our DBs are around 100 GB. So I am thinking about this configuration:

Hosting solr alongside IIS - since we havent really maxed out IIS, and network is slow, I want to host it in the same box.

Solr is going to be hosted with the default container (Jetty) and secured by accepting only localhost connections (from IIS). I want minimal administrative overhead for this piece.

I'm going to have a dedicated master core for pure indexing and multiple replicated slave cores (may be 10) for pure querying. All the index data can be present in SSD.

Now my questions are:

  1. How should I handle load balancing ? Does solrcloud do it automatically ? (The example I looked at (below) says "sharding" - I dont really need sharding.) Is it ok to use Alpha in production ? (we have 100 different clients and the corresponding n/w,h/w configurations out there)

  2. Should I handle commit & replication from code or let solr handle it ?

  3. When a replication happens, how do I route the requests to the other cores ? (covered as part of #1 ?)

  4. During replication, will the master core be locked out for further updates ? Should I handle this case from code ?

  5. Is it possible to pull the "last updated" data from the slave core - Ideally I would want to go with near realtime search feature, but if solrcloud is no go, then I want to show this time stamp in UI so that the users will get an idea of how up to date the data is.

http://wiki.apache.org/solr/SolrCloud/

Explicitly specify the addresses of shards you want to query, giving alternatives (delimited by |) used for load balancing and fail-over:

shards=localhost:8983/solr|localhost:8900/solr,localhost:7574/solr|localhost:7500/solr

Any help is much appreciated.

Cheers !

1

1 Answers

1
votes

I did some more research and figured out:

  1. How should I handle load balancing: SolrCloud does it automatically, just hit any node/core in the cluster and you are good to go. SolrCloud preserves the state in Zoo Keeper, so it knows where to route the request.

  2. Commit: I am planning to do soft commits for 1 second and hard commits for 10 minutes through configuration. Replication: No need to worry, when new nodes/cores go online, Solrcloud automatically assigns as a shard (if not all shards online) or replica.

  3. Handled automatically (#1).

  4. N/A with solr cloud. Writes and reads can be done to any instance/core, doesnt matter.

  5. I'm going with near real time search, so not going to worry about this. Would still appreciate if someone answered it.

I'm hoping that my research will be useful to someone !