0
votes

I have a Solr index, which host 4 millions document and whose size is 65 Gb. When I browse my index using the web UI everything is fast. But my real queries, which are made of about 2000 Term (all coming from the same field), are way too slow.

To increase the speed of my Solr queries I first copied the index into my RAM which makes things much faster but still I need to increase the speed.

I also have created a multi-threaded version of my query, using Java7 RecursiveTask, where I basically divide the number of query terms by 2 until the number of query terms pass below a threshold. Then I aggregate the results of the sub-queries to build a final response. It makes things faster but it creates other kind of problems.

Here is the code I use for the multiple terms query

MultiPhraseQuery query = new MultiPhraseQuery(); 
query.add(queryTerms); // where queryTerms is an array of Term 

TopDocs tops = searcher.search(query, rows);
ScoreDoc[] scoreDoc = tops.scoreDocs;

Does anyone has some nice suggestions to improve the speed performance ? Thank you

1
2.000 terms are a lot. It would become faster, if you could reduce the amount of terms by grouping. But to help on that I would need to know your problem field. This is hard from remote.cheffe
I am not sure I understand what you mean by "problem field". If it is related to the title, what I want to say is all my query terms belong to the same solr field. If it is related to the domain of application I am working in: what I do is indexing music. I have so many terms because to describe one music track I have a descriptor (a term) every 200 ms.lizzie
The question was targeted on the domain, I am non native english ;)cheffe

1 Answers

1
votes

I believe that 2,000 terms are too much for a single index. You may have to refactor your design.

Now, a possibility to scale is by using SolrCloud with many replicates in order to improve the query response time of your index.

Also, do not forget the stored="false" option on the field definition (which might make the index size much smaller)