Faceting in Solr when index contains millions of documents

Question

I'm working on a project that uses a solr index with a few million documents and we've recently hit a memory problem. Faceting has become unusable on a couple of our fields - solr runs out of heap memroy - because of the number of documents containing those fields.

What options do we have besides increasing the memory? We see memory increases as a temporary solution because the number of documents goes up by a few 100k documents per day.

I'm looking at the minute into solrcloud but I'm not sure this is the right solution.

Any suggestions?

Thanks!

the default one, which ever it is. I do a search for facet.method to see which is the right one :) — Tudor S.
Each request allocates memory temporarily for faceting. How many concurrent requests do you have? — Toke Eskildsen
when I encountered the problem it was only my request because I was using the dev sever — Tudor S.

YoungHobbit YoungHobbit · Accepted Answer · 2015-08-27T13:52:17

FacetFields: Allow for facet counts based on distinct values in a field. There are two methods for FacetFields, one that performs well with few distinct values in a field, and the other for when a field contains many distinct values (generally, thousands and up – you should test what works best for you).

The first method, facet.method=enum, works by issuing a FacetQuery for every unique value in the field. As mentioned, this is an excellent method when the number of distinct values in a field is small. It requires excessive memory though, and breaks down when the number of distinct values gets large. When using this method, be careful to ensure that your FilterCache is large enough to contain at least one filter for every distinct value you plan on faceting on.

The second method uses the Lucene FieldCache (future version of Solr will actually use a different non-inverted structure – the UnInvertedField). This method is actually slower and more memory intensive for fields with a low number of unique values, but if you have a lot of uniques, this is the way to go. This method uses the FieldCache to look up the values for the given field for each document, and every time a document with a given value is found, the value has its count incremented.

Please check the allotted memory for each cache and if you can tweak FieldCache to handle the situation. (As you have mentioned, type3 and type4 have large number of documents.

Source for the above information is Scaling Lucene and Solr. I found one more article which talks about solr faceting You are faceting it wrong.

Faceting in Solr when index contains millions of documents

2 Answers