13
votes

I read all answers to the same question and am not any clearer on which one I should use for my usecase and why. Both return the same result. I understand that "FilterQuery would be cached making the overall query time faster", like someone correctly answered.

I also understand that "filtering also allows tagging of facets, so you can tag facets to include all facets that are returned for your query", like someone else also correctly answered.

What I don't understand reading this, is why then anyone would use Q, since FQ seems to be so much better, based on all the answers and books I've seen.

Except, I'm sure there's probably a reason that both exist.

What I would like is to figure out what's best for my use case - the documentation is sorely lacking in useful examples.

  • My documents have: date, client, report, and some other fields
  • 1 business date = 3.5 million documents.
  • The total count of documents is 250 million and counting (60 dates * 8000 clients * 1000s of reports )
  • I facet on date, client, report, and I do use tagging of facets.
  • The UI overall looks like any e-commerce site, example: Amazon, with facets on the left.
  • Scoring is not used.

Business rule #1: date must always be present in every query.

Business rule #2: 99% of queries are going to use the LATEST date, but RANDOM client and random report.

A Fact: We determined that it’s faceting that is slow, not searching.

QUESTIONS:

Given this search criteria, and these ways to write a query:

A) q=date:20130214 AND client:Joe & facet.field=date & facet.field=client...

B) q=date:20130214 & fq=client:Joe & facet.field=date & facet.field=client...

C) q=client:Joe & fq= date:20130214 & facet.field=date & facet.field=client...

D) q=*:* & fq=date:20130214 & fq=client:Joe & facet.field=date & facet.field=client...

  • which of the above do you think would be best and why ? Remember, most queries are going to run against 20130214
  • in FQ filtering done first, and then Q condition applied, or the other way around?

Today, I have D) is used in all cases, but I suspect this is wrong and is causing OOMs in Solr(version 3.6).

Thank you for your help!

2

2 Answers

24
votes

q query is the main query of the Request.
It is the one that would allow you to actually search over multiple fields.
q query would decide what score each of the documents has and hence would take part in the relevancy calculation.

q=*:* will just return all the documents with the same score.

fq is the filter query used to filter the documents and is not related to search.
So if you have any fixed value which you want to filter on you should use filters to limit your results.
fq does not affect the scoring of the results.
While filtering, Solr uses Filter cache to enhance the performance for the subsequent filter queries.

So ideally, you should check what the requirement demands. If you want to search, you should always use q, and if you want to filter/limit results you should use fq.

Facets are just an add-on to the results and do not affect your results.

3
votes

To answer your questions:

  • Based on your Business Rule, I would suggest that you put the date in the fq value since you are always limiting(filtering) results by a date value and it sounds like the date values could be reused by Solr. And the Q can contain the search for random client and report values as necessary.

  • When a user first comes to the UI, since you are only showing facets I would suggest you use q=<id field>:* where <id field> is your document id in the index and also set rows=0. Use the date restriction in the fq value again. Specifying rows=0 will produce a facet only query, reference Solr - Getting facet counts without returning results