Since Solr computes facets on in-memory data-structures, facet computation is likely to be CPU-bound. The code to compute facets is already highly optimised (the getCounts
method in UnInvertedField for a multi-valued field).
One idea would be to parallelize the computation. Maybe the easiest way to do this would be to split your collection into several shards as described in Do multiple Solr shards on a single machine improve performance?.
Otherwise, if your term dictionary is small enough and if queries can take a limited number of forms, you could set up a different system that would maintain the count matrix for every (term, query) pair. For example, if you only allow term queries, this means you should maintain the counts for every pair of terms. Beware that this would require a lot of disk space depending of the total number of terms and queries. If you don't require the counts to be exact, maybe the easiest would be to compute these counts in a batch process. Otherwisee, it might be (possible, but) a little bit tricky to keep the counts sync'd with Solr.
facet.limit
? I've noticed that such queries can take a long time even with 100,000+ records iffacet.limit
is not set (in your case, to whatevern
might be). – David Faber