8
votes

I have a Solr index with many entries, and upon query some subset is returned - each entry having some score, (Obvious). Once the results are returned with scores, I want to be able to only "keep" results that are above some score (i.e. results of a certain quality only). Is it possible to do this when the returned subset could be anything?

I ask because it seems like on some queries a score of say 0.008 is resulting in a decent match, whereas other queries a higher score results in a poor match.

Ideally I'm just looking for a method to take the top x entries as long as they are of at least a certain quality.

2

2 Answers

4
votes

I think you should not do this. With the TF-IDF scoring model, there is no way to compute a score above which all results are relevant and vice-versa. And if you manage to do this, it is very likely that this threshold will not be valid anymore after a few updates to your index (because document frequencies will change).

If you still want to do this, I think it is achievable using function queries : there are a if (in trunk), and a query functions available in Solr. Just filter your results so that you only keep entries which have a higher score than a given threshold.

2
votes

Would also like to go through ScoresAsPercentages first.

Solr does not normalize scores since it may be easily done at the client side.
you can use the maxScore which is provided in the results, by dividing all scores by maxScore.
The first record will have the score of one followed by the rest.