1
votes

I'm using solr along with tomcat as servlet. I've setup solr to use only one core, and defined a DIH to import documents row by row from mysql tables. Everything is fine and works good. the documents get indexed correctly and I can search among 'em.

The problem is that I'm trying to use the suggester module but I have problem building what ever it needs to build for the first time using a url like this:

http://user:pass@localhost:port/solr/corename/suggest?q=whatever&spellcheck.build=true

I left out one important piece of information: the data being imported is 4.7 million records right now.

At first it couldn't event build the spellcheck dictionary(if that is what it's building) for 1 million documents because jvm would run out of heap memory with the following message:

java.lang.OutOfMemoryError: GC overhead limit exceededjava.lang.RuntimeException:
java.lang.OutOfMemoryError: GC overhead limit exceeded at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434) at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at 

so I gradually increased the heap memory and right now it's about 2GB, which I supposed is a lot.

Of course the obvious solution is to increase the heap memory of java yet again, but i'm wandering if there's any way to divide and conquer the dictionary building process? Or any other solution for that matter.

Thanks a lot

2

2 Answers

0
votes

1) A parameter that can have a big impact on the spellcheck index size is "thresholdTokenFrequency". Adding the following parameter to your SpellCheckComponent configuration may be a remedy:

<float name="thresholdTokenFrequency">.01</float>

2) If the data in your spellcheck field is copied from different other fields, you may try to setup different SpellCheckComponents operating on seperate fields each.

Did not try this and I'm afraid that merging the results from different SpellCheckComponents may be quite tricky.

0
votes

Solr needs a lot of memory during building indexes, like spellcheck - index.

Of cause it's not the way to put more and more memory to the machine.

I had an equal problem and found out, that increasing the virtual memory will solve the problem. You can use ulimit -v to show the current state of the virtual memory. In my case that was 14GByte for an 5GByte index, which war not enough (10Million docs)

So I put ulimit -v unlimited in the beginning of the tomcat start-script. That solved the problem for me.