1
votes

I am relatively new to Java world and started using Solr recently. I am running Solr 5.2.1 on Amazon t2.small box which is single core and 2 gm RAM ubuntu server. I ran Solr with 1gb heap space configuration. The Solr core currently has 8M documents with 15 fields, 14 of which are String ids only. The other being a DateRange Field type.

The search queries are typically long ones typically on range of 15000-20000 characters. This is due to filter queries being used with multiple field values on the range of 100s. For example,

/select?fq=field1:("value-1"+OR+"value-2"+.......+OR+"value-n") , n ranges from 1000-2000

I modified the Jetty's MaxURLLength to 65535 which allowed me to do this.

Earlier, when the number of documents were < 2M, Solr was running smoothly. But, when the number of documents reached 8M, Solr starts crashing giving OutOfMemoryError Heap Space Error. The following is the exception

java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:115)
    at org.apache.lucene.spatial.prefix.IntersectsPrefixTreeFilter$1.start(IntersectsPrefixTreeFilter.java:62)
    at org.apache.lucene.spatial.prefix.AbstractVisitingPrefixTreeFilter$VisitorTemplate.getDocIdSet(AbstractVisitingPrefixTreeFilter.java:130)
    at org.apache.lucene.spatial.prefix.IntersectsPrefixTreeFilter.getDocIdSet(IntersectsPrefixTreeFilter.java:57)
    at org.apache.lucene.search.Filter$1.scorer(Filter.java:95)
    at org.apache.lucene.search.Weight.bulkScorer(Weight.java:137)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:768)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:485)
    at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1243)
    at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:926)
    at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1088)
    at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1609)
    at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1485)
    at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:561)
    at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  1. Is the above exception due to lack of memory?
  2. Is it due to query being too long which inturn is affecting the search?
2
1. Yes. If you read the word "OutOfMemoryError", you can see how that's kinda obvious. 2. Probably, but you'll know for sure if you profile your application.Kayaman
you mean profile Solr?sravan_kumar
What is the -Xmx setting for your JVM? Have you tried to increase it?rudolfv
@rudolfv -Xmx was set to 1gbsravan_kumar
@sravan_kumar No, I mean profile your application. You're not developing Solr (I hope).Kayaman

2 Answers

3
votes

This is probably due to the number of filters: Each filter uses 1 bit per document in your index. With 8M documents, each filter uses a 1MB.

If the filterCache section in your solrconfig.xml is from the example, its size is 512. This means that it will, over time, come to contain 512*1MB data for your index. With a 1GB heap, it sounds reasonable that it will run out of memory.

The easy solution is to lower the amount of entries in the filter cache. That might negatively impact your search speed or it might not influence it at all, if your filters are unique between calls. You will have to test that.

See https://wiki.apache.org/solr/SolrCaching#filterCache

0
votes

If you're filtering on your date field, then using a date range filter (in place of a Boolean OR with 100s of values) will save Solr from (the I/O, CPU and memory overhead of) scanning your collection 100s of times per query.

Solr's TrieDateField type is indexed in a way (using a Trie) such that finding documents with date values within a range is a cheap operation (vs iterating the entire collection).

If you're instead querying for documents with dates "at the same time of day" over the past 1000-2000 days, then consider encoding the time-of-day separately in its own field (as an int perhaps to save space?) so you can focus your filter first on time-of-day before eliminating documents > 2000 days old.