1
votes

I have been tasked with dealing with OutOfMemoryError problems on a Solr installation. I have finally managed to get it to stay up for more than a few minutes by using the AggressiveHeap JVM option.

I have never worked with Solr, so am feeling my way a bit.

This is the process of steps that we take :

  1. Start Tomcat
  2. Kick off a delta-import

After the delta-import is started, the heap consumption rises inexorably. We tried with Xmx set to 4 Gigs which caused OutOfMemoryErrors or the system to become unresponsive, so tried the AggressiveHeap option, which caused the JVM to take about 5.5 Gigs of RAM. As you can see in the screenie, this time the GC was able to free memory, the memory consumtion becomes less quick, and then to the right of the image there is another GC which actually works, and it keeps going like that.

VisualVM

What is this initial allocation of memory? Is it the index being loaded into RAM? Is there a way to reduce this?

I have tried tweaking ramBufferSizeMB, maxBufferedDocs, mergeFactor and have also uncommented the StandardIndexReaderFactory's declaration to let me set termIndexDivisor to 12, but it is hard to see whether these changes have made any difference or not (yes: more analysis is needed).

The index has been created over a number of failed indexing sessions - the addition of the termIndexDivisor parameter is more recent - does the fact that the index files already exist stop this parameter from having any effect?

(The machine is physical, has 12 gigs of ram and 16 cores. It is sharing the machine with another large Tomcat instance. We are running Oracle JDK 1.6 21)

2

2 Answers

2
votes

There are various things. One thing is the mergeFactor because it controls the number of segments generated and you will have an segment reader per segment. However changing this option will not result in an immediate change to the memory usage. The other options control mainly the RAM usage of indexing processes not the RAM usage at startup or during search.

The second thing is searcher warming. Usually there are some queries run to warm searchers during startup and those queries executed are cached. There are also options which control cache size. See also: http://wiki.apache.org/solr/SolrCaching

Setting the termIndexDivisor to 12 is obviously not a good thing if you run into memory problems. As far as i know in 4.x the term index divisor is 256 or 128 and at least in 1.x it is set to 32. This option controls how many entries of your terms are loaded to RAM. Every 12th term in your case. The termIndexDivisor should have effects even if the index exists already.

If your index is loaded to RAM is controlled by the direcotryfactory configuration option.

If you work on Solr trunk it might be possible that you missed a change that the StandardDirectoryFactory resolves under certain circumstances to MMAPDirectory which would lead to intense RAM usage (if you have a large index). This change happened sometime between April this year and Now. Im not even sure how this made it through code-review but thats actually the current state of trunk.

0
votes

I ended up doing some digging with the debugger, as even with @fyr's recommendations the memory consumption didn't really decrease by much.

It turned out that the deltaQuery and deltaImportQuery were both carbon copies of the query. This meant that instead of only returning the PKs of the entries that had changed since the last import, the query was returning every row and Solr was trying to store them in memory. :(