2
votes

So I have a SOLR query with the following fq filter param:

(field_name:(1 OR 2 OR 25 OR 33 OR 333 OR 32 OR ...... and 2000 other ORS))

So Solr has this maximum boolean limit:

<maxBooleanClauses>1024</maxBooleanClauses>

Hence I have no choice but to split this query and try to combine the result from the split queries. Moreover I'm paginating the results so I'm only interested in the first 10 documents of the matches as well a total count of of all the results.

The problem is...the search object has one to many relation with the field_name attribute. Hence a solr document could have multiple field_name values .... Now in the original query, this will all be nicely resolved with SOLR OR statements....however, if I separate the ORs and then execute 3 separate queries, due to this one to many relation, certain documents would get returned by multiple queries. Hence I can't just add the numResult for each query to get the actual aggregate numResult and moreover the returned documents would often get returned by multiple queries

How can I resolve this dilemma, how can I manipulate this so that I get the same result (same documents returned, with same total numresult) with the original unsplit query after splitting this into 3 separate queries

1
Why not increase maxBooleanClauses? Also use a POST request so the data gets thru to the server.arun

1 Answers

3
votes

One simple solution is to edit schema.xml and increase the maxBooleanClauses.

<maxBooleanClauses>4096</maxBooleanClauses>

If for some reason you don't want to increase maxBooleanClauses, you can join groups of terms to a single clause to generate a single query with fewer clauses.

For example, let's assume your maxBooleanClauses equals 4. Also assume that you have the following query:

1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7 OR 8 OR 9

First, you can remove the ORs because Solr uses an OR by default anyway. Second, combine each triplet of three consecutive terms into a single clause so that your query is modified to:

(1 2 3) (4 5 6) (7 8 9)

The original filter query is equivalent to the modified one. In order to transform an x-long query to an equivalent (at most) y-long one, join terms in groups of ceil(x/y).