I am planning on implementing Solr for our clients. We have a .Net/SQL Server based product. Our DBs have grown so big and we have decided to use Solr to improve the query performance (auto complete, pick lists, grid search etc). We have massive IIS boxes (varies by clients, typical config include 16 core, 96 GB RAM etc.), but slow network. Our DBs are around 100 GB. So I am thinking about this configuration:
Hosting solr alongside IIS - since we havent really maxed out IIS, and network is slow, I want to host it in the same box.
Solr is going to be hosted with the default container (Jetty) and secured by accepting only localhost connections (from IIS). I want minimal administrative overhead for this piece.
I'm going to have a dedicated master core for pure indexing and multiple replicated slave cores (may be 10) for pure querying. All the index data can be present in SSD.
Now my questions are:
How should I handle load balancing ? Does solrcloud do it automatically ? (The example I looked at (below) says "sharding" - I dont really need sharding.) Is it ok to use Alpha in production ? (we have 100 different clients and the corresponding n/w,h/w configurations out there)
Should I handle commit & replication from code or let solr handle it ?
When a replication happens, how do I route the requests to the other cores ? (covered as part of #1 ?)
During replication, will the master core be locked out for further updates ? Should I handle this case from code ?
Is it possible to pull the "last updated" data from the slave core - Ideally I would want to go with near realtime search feature, but if solrcloud is no go, then I want to show this time stamp in UI so that the users will get an idea of how up to date the data is.
http://wiki.apache.org/solr/SolrCloud/
Explicitly specify the addresses of shards you want to query, giving alternatives (delimited by |) used for load balancing and fail-over:
shards=localhost:8983/solr|localhost:8900/solr,localhost:7574/solr|localhost:7500/solr
Any help is much appreciated.
Cheers !