0
votes

I am using Solr v6.2.1 .We are not getting accurate results using "sort score desc".

let's assume we have a list of documents in our index as below

[{ "id": "1", "content": ["java developer"] },

{ "id": "2", "content": ["Java is object oriented.Java robust language.Core java "] },

{ "id": "3", "content": ["java is platform independent. Java language."] }]

Content is defined as multivalued field in the schema

field name="content" type="text_general" multiValued="true" indexed="true" stored="true"

when I search for java using below query

curl http://localhost:8983/solr/test/select?fl=score,id&q=(java)&wt=json&sort=score desc

I am expecting the content with Id :2 should come first as it contains more matches related to java.But solr is giving inconsistent results.

Please suggest why I am not able to get desired results.

2
The number of matches is not the only factor used to calculate the score (the length of the field is also used, where shorted fields are deemed more important). Append debugQuery=true to your query URL to see exactly how each score is calculated. You don't have to sort by score explicitly either, that's done by default. You should also provide a field name when searching, such as content:java, so you're sure you're searching the field you think you're searching.MatsLindh

2 Answers

0
votes

You need to add typeDef as edismax in your query, please find below query again.

 http://localhost:8983/solr/test/select?fl=score,id&q=(java)&wt=json&sort=score 
 desc&defType=edismax

Once you pass edismax as defType sorting on scores starts working as expected.

0
votes

First, as suggested by Rahul, you should mention df or 'default query field' to execute your query explicitly on.

Secondly, your assumption about the doc with maximum occurrences of a particular term to show up as first result is not correct. What you are referring to is called term frequency or shortly tf. The ranking function used by Solr to calculate the relevance score uses 'tf', along with 'idf', the inverse document frequency. You can read more about it here Okapi_BM25.

Roughly, the score translates into (tf)*log(idf).

This will ensure that the most relevant documents for a particular query are retrieved. Intuitively, this means that, since 'Java' is present in other documents as well, the terms that differentiate doc 2 are probably 'object oriented', 'robust'.