0
votes

I have a Solr (6.2) DisMax Select Query which uses pf (phrase fields) and ps (phrase slop).

pf = text^2.2 title^2.2, ps = 2;

I want my query to return results following this algorithm:

  1. If there are exact matches for the queried phrase, return them first, sort by date
  2. If there are documents that have atleast one of the words of the queried phrase, return them second, sort by date

Example Data: text (last_modified timestamp in parenthesis)

  • stuff about important people (2018)
  • important people: the article (2019)
  • some people find that important (2020)
  • important news (2015)
  • people of the decade (2020)

The desired result:

phrases with acceptable slop first

  1. some people find that important (2020)
  2. important people: the article (2019)
  3. stuff about important people (2018)

then at least one of the words

  1. people of the decade (2020)
  2. important news (2015)

What i've tried:

  • wrapping a query into double quotes and using qs (query phrase slop), this way it works as desired, but ignores the "at least on of the words" part;
  • using a bq (boost query) like last_modified:[NOW/DAY-3MONTH TO NOW/DAY]^20.0;
  • using a bf (boost function) like recip(ms(NOW,last_modified),3.16e-11,1,1);
  • explicit last_modified desc sort - but it ignores the score completely
  • using multiple sort score desc, last_modified desc - but the second sort will work only if there is a tie for the first one (and there is almost never a tie)
1

1 Answers

0
votes

I've managed to get the (almost) desired result by using:

  • Boost Functions (bf) = recip(ms(NOW,last_modified),3.16e-11,1,1)^1500 (had to use a huge boost number to bubble up the most recent results);

  • Query Fields qf = 'text^4 title^2';

  • Phrase Fields pf = 'text^5 title^2';

  • Phrase Slop ps = 4;

  • Query Phrase Slop qs = 2;

  • Minimum Should Match mm = len(split('\s', query)) + 1 (preudocode)

Split the query by whitespace, join the exact phrase and each separate word with OR and set Minimum Should Match parameter (mm) to len(split)+1 so, for example, query "apple dog" transforms into "apple dog" or apple or dog. The double quotes are necessary for qs parameter to work and force results with exact phrase to bubble up.

Maybe there are some tweaks to the method i'm using, any comments are appreciated.