0
votes

I am experimenting with boosting in Solr and have become confused how my document scores are being affected.

I have a collection of technical documents that contain fields like Title, Symptoms, Resolution, Classification, Tags, etc. All the fields listed are required except Tags which is optional. All fields are copied to _text_ and that field is the default search field.

When I run a default query

http://search:8983/solr/articles-experimental/select?defType=edismax&fl=id,%20tags,%20score&q=virtualization&qf=_text_

The top article (Article 42014) comes back with a score of 4.182179. This document has 6 instances of the word virtualization in multiple fields -- Title, Symptoms, Resolution, and Classification. This particular article does not have any Tags value.

I now want to experiment with boosting so that articles that have Tag values matching the search terms appear closer to the top of the results. To do this, I send the following query

http://search:8983/solr/articles-experimental/select?defType=edismax&fl=id,tags,score&q=virtualization&qf=tags^2%20_text_

which keeps the same Article 42014 at the top of the list but now with a score of 4.269944. However, results 2 through 65 now all have the same score of 4.255975. In the non-boosted query the scores range from 4.056591 down to 2.7029662.

In addition, the collection of document id coming back are not quite the same as before. I certainly expect some differences but not the extent that I am seeing considering that the vast majority of the articles coming back have the search term as a tag.

Ultimately, I am having trouble finding out exactly how boosting changes the score and what is an "appropriate" boost value. Understanding that it is probably subjective, what criteria should I be considering?

1
Append debugQuery=true to your query, and it'll show you exactly how the score is being calculated as well. It'll show what values are being multiplied or added together. explain.solr.pl is useful for visualizing these values.MatsLindh

1 Answers

0
votes

well, with all parameters you set for edismax (plus the default values for all the ones you don't set) Solr runs just the algorithm (BM25) nowadays and all scores will be calculated.

The specific boosting values etc you should use for your query are impossible to guess, you must try and retry. It is a known pain, I even built vifun a tool to help me visualize how different parameters affect score with edismax.