1
votes

So I read this: http://wiki.apache.org/solr/SolrCaching#filterCache

and specifically

The filter cache stores the results of any filter queries ("fq" parameters) that Solr is explicitly asked to execute. (Each filter is executed and cached separately. When it's time to use them to limit the number of results returned by a query, this is done using set intersections.)

So my question is this. Lets say my app filters on a set of different formatsIDs. If the format ids are numeric say 1,2,3,4,5. And there are many permutations of those being sent in queries as fq parameters.

if I wrote a warming query like this...

...
<str name="fq">format:(1)+OR+format:(2)+OR+format:(3)+OR+format:(4)+OR+format:(5)</str>
...

Would that warm things up and help all my queries trying to filter by various permutations of those formats OR... only folks searching for that permutation?

Should I instead create 5 separate warming queries (1 for each format) to take advantage of "set intersection"?

Or will that query create the sets for each format?

Example queries

...fq=format:(1)+OR+format:(2)...
...fq=format:(1)+OR+format:(3)...
...fq=format:(2)+OR+format:(3)...
...fq=format:(2)+OR+format:(5)...
etc...

so none of those I believe will use the filter cache created by the warming query listed above.

1
it is looking like this is saying that it is cached by the FQ field... and therefore with this "OR" logic I would have to create the permutations and run a query on each permutation for warming or the result set will not be cached or able to be taken advantage of. - John Sobolewski

1 Answers

2
votes

See https://wiki.apache.org/solr/CommonQueryParameters#fq. It says:

The document sets from each filter query are cached independently. Thus, concerning the previous examples: use a single fq containing two mandatory clauses if those clauses appear together often, and use two separate fq params if they are relatively independent.

It is one cache entry per fq param specified in your query.

You are not doing set intersection with OR; you are doing set union. But if you were doing set intersection like:

fq=format:(1 AND 2 AND 3 AND 4 AND 5)

(assuming format is a multi-valued field here) and have different subsets of those 5 values like

fq=format:(1 AND 2)
fq=format:(3 AND 4 AND 5)

then issuing separate filter queries like:

fq=format:1&fq=format:2&fq=format:3&fq=format:4&fq=format:5

will help all the subset queries. Here you will have 5 entries in the filter cache and they are intersected for all the subsets.

Regarding permutations i.e. the order in which the values appear in the filter query, I believe it will use hashing for the fq param, so you are better off sorting the values first and then forming your filter query.