Our development team is currently looking into migrating our search system to Apache Solr, and we would greatly appreciate some advice on setup. We are indexing approximately two hundred million database rows. We add about a hundred thousand new rows throughout the day. These new database rows must be searchable within two minutes of their receipt.
We don't want the indexing to bog down the searcher, so our thought is to have two Solr servers running on different machines in a replication setup. The first Solr instance will be the indexer. It will use the DataImportHandler to index the delta and have autocommit enabled to prevent overzealous commit rates. Index optimization will take place during scheduled periods. The second Solr instance (the slave) will be the primary searcher and will have its indexes stored on RAIDed solid state drives.
What we are concerned about is failover. Our searches are mission-critical. If the primary searcher goes down for whatever reason, our search service will automatically shunt queries over to the indexer node instead. Indexing is equally critical, though. If the indexer dies, we need to have a warm failover standing by. Is there a recommended way to automate master node failover in Solr replication? I've begun looking into ZooKeeper, but I wasn't sure if this was the best approach.