3
votes

I have a 4.6 solr cloud installed over tomcat7, with a setup of 4 shards (two replicas each) and ZooKeeper ensemble of 3 servers.

Every solr server has 8 cores, and 30Gb of ram, i allocate 15Gb for the solr/tomcat, and the rest to be handles by the OS

We have a single collection of about 25M documents, the index size is ~15Gb. While the solr can cope with many requests per minute, after large indexing i get an

Exception in thread "http-bio-8080-Acceptor-0" java.lang.OutOfMemoryError: unable to create new native thread

thrown by the tomcat.

The indexing is done using SolrJ's CloudSolrServer, with several indexing threads (sharing the CloudSolrServer instance) an perform bulk indexing of 5000 documents at a time. That process includes both adding & deleting documents. We perform only one commit at the end of the whole indexing.

This exception starts with one server, but soon all other follows, and the cloud cannot serve any request (in the logs I can see them trying to query each other without success)

Note that the tomcat does not hang, only the solr process. In order to get the solr cloud back - I have to restart all 8 servers (killall ... service start..)

Here are the some configurations that I use:

solrconfig.xml:

<autoCommit>
   <maxDocs>50000</maxDocs>
   <maxTime>${solr.autoCommit.maxTime:100000}</maxTime>
   <openSearcher>false</openSearcher> 
</autoCommit>
   <autoSoftCommit> 
   <maxTime>${solr.autoSoftCommit.maxTime:10000}</maxTime> -->
</autoSoftCommit>

<useColdSearcher>true</useColdSearcher>
<maxWarmingSearchers>4</maxWarmingSearchers>

JVM:

JAVA_OPTS="$JAVA_OPTS -Xmx15380m -Xms15380m -DzkHost=... -XX:NewRatio=1 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=80 -XX:MaxTenuringThreshold=15 -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60 -XX:CMSTriggerPermRatio=80 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+AggressiveOpts -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:MaxPermSize=512M -XX:PermSize=128M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/tomcat7/heapDump.hprof"

I didn't see any significant use of CPU by the GC, most of it done by the ParNew, so no "stop-the-world" issue here.

  1. Can it be that the documents from the indexing process choke the native memory?
  2. Is there any specific thing I should look for to understand the cause of that error? (# of open connections/files, tomcat parameters etc)?
  3. enter image description hereCan i identify such cases in advance, and perhaps add more servers (assuming this will solve the issue)?
1
Please provide the results from following the checklist: ulimit -a, free -m, ps uH p <PID_OF_U_PROCESS> | wc -l - Ivan Mamontov

1 Answers

0
votes

Try setting -Xmx15380m to 2-4 gb that is -Xmx2048m