1
votes

I have a requirement on how results should be coming back sorted from solr. At a high level they should look like this:

  • Exact matches on subset1 fields sorted by date
  • Exact matches on subset2 fields sorted by date
  • Partial matches on subset1 fields sorted by date
  • Partial matches on subset2 fields sorted by date
  • Fuzzy matches on subset1 fields sorted by date
  • Fuzzy matches on subset2 fields sorted by date

Currently I am sorting on solr score and then date. When I query solr I am using a boost function that gives an inverse boost to older documents so they get moved down and newer documents 'float' to the top as well I am boosting the appropriate fields so that I do get exact, partial, and fuzzy matches in the correct order. This has gotten me most of the way there.

Now for the tricky part. The requirement states that if I search for something like 'red ford truck' the documents that contain 'red ford truck', regardless of the frequency of the terms, should be scored the same. The boost newer docs to the top doesn't effect the score enough to push documents with a higher term frequency down far enough.

For example let's say I have 2 documents: doc 1:

  • Field1:"The red ford truck was really red and it was a fast truck"
  • Date: 1/1/2010

doc 2:

  • Field1:"The red ford truck was parked on the street"
  • Date:1/10/2012

When I search for 'red ford truck' I want document 2 to appear first because it is newer and has all the queried terms. Currently document 1 will appear first because it has more matches in Field1 and the inverse boost doesn't do enough to push it down.

So now for my question is there a configuration point in solr to tell it to match on queried terms exactly once for a document? Kind of like an Exists in T-SQL.

If there is any other information that would be helpful let me know and thank you for your time in advance.

1

1 Answers

1
votes

Those scores are different because of both the terms frequency and the length of the field.

omitNorms seems what you're looking for regarding the length of the field. Have a look at this previous answer, and remember that index-time boosting will be disabled too for that field:

If true, omits the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory).

omitTermFreqAndPositions seems what you're looking for regarding the term frequency:

If true, omits term frequency, positions, and payloads from postings for this field. This can be a performance boost for fields that don't require that information. It also reduces the storage space required for the index. Queries that rely on position that are issued on a field with this option will silently fail to find documents. This property defaults to true for all fields that are not text fields.