0
votes

I'm working on a project that uses a solr index with a few million documents and we've recently hit a memory problem. Faceting has become unusable on a couple of our fields - solr runs out of heap memroy - because of the number of documents containing those fields.

What options do we have besides increasing the memory? We see memory increases as a temporary solution because the number of documents goes up by a few 100k documents per day.

I'm looking at the minute into solrcloud but I'm not sure this is the right solution.

Any suggestions?

Thanks!

2
The facet.method is a good hint, which one are you using?cheffe
the default one, which ever it is. I do a search for facet.method to see which is the right one :)Tudor S.
Each request allocates memory temporarily for faceting. How many concurrent requests do you have?Toke Eskildsen
when I encountered the problem it was only my request because I was using the dev severTudor S.

2 Answers

2
votes

FacetFields: Allow for facet counts based on distinct values in a field. There are two methods for FacetFields, one that performs well with few distinct values in a field, and the other for when a field contains many distinct values (generally, thousands and up – you should test what works best for you).

The first method, facet.method=enum, works by issuing a FacetQuery for every unique value in the field. As mentioned, this is an excellent method when the number of distinct values in a field is small. It requires excessive memory though, and breaks down when the number of distinct values gets large. When using this method, be careful to ensure that your FilterCache is large enough to contain at least one filter for every distinct value you plan on faceting on.

The second method uses the Lucene FieldCache (future version of Solr will actually use a different non-inverted structure – the UnInvertedField). This method is actually slower and more memory intensive for fields with a low number of unique values, but if you have a lot of uniques, this is the way to go. This method uses the FieldCache to look up the values for the given field for each document, and every time a document with a given value is found, the value has its count incremented.

Please check the allotted memory for each cache and if you can tweak FieldCache to handle the situation. (As you have mentioned, type3 and type4 have large number of documents.

Source for the above information is Scaling Lucene and Solr. I found one more article which talks about solr faceting You are faceting it wrong.

0
votes

Before solrcould you can think of solr multiple core.

On a single instance, Solr has something called a SolrCore that is essentially a single index. If you want multiple indexes, you create multiple SolrCores.

With SolrCloud, a single index can span multiple Solr instances.

This means that a single index can be made up of multiple SolrCore's on different machines.

These SolrCores that make up one logical index a collection.

A collection is a essentially a single index that spans many SolrCore's, both for index scaling as well as redundancy.

If you wanted to move your 2 SolrCore Solr setup to SolrCloud, you would have 2 collections, each made up of multiple individual SolrCores.

SolrCloud adds the distributed capabilities in Solr. With this enable you can have highly available, fault tolerant cluster of Solr servers.

Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities.

You can get more info about SolrCloud here

https://cwiki.apache.org/confluence/display/solr/SolrCloud