Solr - Running out of heap memory while performing spellcheck.build

Question

I'm using solr along with tomcat as servlet. I've setup solr to use only one core, and defined a DIH to import documents row by row from mysql tables. Everything is fine and works good. the documents get indexed correctly and I can search among 'em.

The problem is that I'm trying to use the suggester module but I have problem building what ever it needs to build for the first time using a url like this:

http://user:pass@localhost:port/solr/corename/suggest?q=whatever&spellcheck.build=true

I left out one important piece of information: the data being imported is 4.7 million records right now.

At first it couldn't event build the spellcheck dictionary(if that is what it's building) for 1 million documents because jvm would run out of heap memory with the following message:

java.lang.OutOfMemoryError: GC overhead limit exceededjava.lang.RuntimeException:
java.lang.OutOfMemoryError: GC overhead limit exceeded at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434) at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at

so I gradually increased the heap memory and right now it's about 2GB, which I supposed is a lot.

Of course the obvious solution is to increase the heap memory of java yet again, but i'm wandering if there's any way to divide and conquer the dictionary building process? Or any other solution for that matter.

Thanks a lot

Simon Simon · Accepted Answer · 2015-01-09T14:50:22

1) A parameter that can have a big impact on the spellcheck index size is "thresholdTokenFrequency". Adding the following parameter to your SpellCheckComponent configuration may be a remedy:

<float name="thresholdTokenFrequency">.01</float>

2) If the data in your spellcheck field is copied from different other fields, you may try to setup different SpellCheckComponents operating on seperate fields each.

Did not try this and I'm afraid that merging the results from different SpellCheckComponents may be quite tricky.

Solr - Running out of heap memory while performing spellcheck.build

2 Answers