7
votes

I'm using Solr and I want to facet over a field "group".

Since "group" is created by users, potentially there can be a huge number of values for "group".

  • Would Solr be able to handle a use case like this? Or is Solr not really appropriate for facet fields with a large number of values?

  • I understand that I can set facet.limit to restrict the number of values returned for a facet field. Would this help in my case? Say there are 100,000 matching values for "group" in a search, if I set facet.limit to 50. would that speed up the query, or would the query still be slow because Solr still needs to process and sort through all the facet values and return the top 50 ones?

  • Any tips on how to tune Solr for large number of facet values?

Thanks.

2

2 Answers

7
votes

Since 1.4, solr handles facets with a large number of values pretty well, as it uses a simple facet count by default. (facet.method is 'fc' by default).

Prior to 1.4, solr was using a filter based faceted method (enum) which is definitely faster for faceting on attribute with small number of values. This method requires one filter per facet value.

About facet.limit , think of it like as a way to navigate through the facet space (in conjunction with facet.offset), like you navigate through the result space with rows/offset. So a value of 10 ~ 50 is sensible.

As with rows/offset, and due to the nature of Solr, you can expect the performance of facet.limit/facet.offset to degrade when the offset gets bigger, but it should be perfectly fine if you stay within reasonable boundaries.

By default, solr outputs more frequent facets first.

To sum up:

  • Use Solr 1.4

  • Make sure facet.method is 'fc' (well, that's the default anyway).

  • Navigate through your facet space with facet.limit/facet.offset.

1
votes

Don't misregard to enable cache faceting related parameters (try different cache sizes to chose the values that fit well to your system):

   <filterCache class="solr.FastLRUCache" size="4096" initialSize="4096" autowarmCount="4096"/>
<queryResultCache class="solr.LRUCache" size="5000" initialSize="5000" autowarmCount="5000"/>