0
votes

I have about 20 million docs in my SOLR. I am using DIH for indexing delta updates in SOLR. It takes about 2-3 hours to index delta import for last one hour and for full import It takes about 5-6 hour to complete.Is there any way to speed up this process other than DIH??

1
Is it the indexing that's taking time - or the retrieval of content from the DB? Bad or missing indexes might make any large import job slow if the backend has to scan the whole table multiple times. How many new documents in a delta-import?MatsLindh
New docs will be like below 500 .but the updates of the existing docs will be large.and I am importing data from about 14-15 tables using joins also.Lijo Abraham

1 Answers

0
votes

You can think of

  1. distributing an index across multiple servers
  2. replicating an index on multiple servers

Distribute the index : Divide the index into parts(shards), each of it runs on a separate machine. Solr then partitions searches into sub-searches, which run on the individual shards, reporting results collectively. this way you will experience faster performance on queries against very large indexes.

Here is good read for scaling on solr

http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-from-500000-volumes-5-million-volumes-and-beyond