I am using Solr for searching institutions... My Solr DB has around 400k documents each of which has multiple fields like ("name","id","city",...)...
A document in my DB looks like this:
"docs":
{
"id": "91348",
"p_code": "71637",
"name": "University of Toronto - Mississauga",
"ext_name": "",
"city": "Mississauga",
"country": "CA",
"state": "ON",
"type": "academic/campus",
"alt_name": "",
"ext_city": "",
"zip": "L5L 1C6",
"alt_ext_city": "",
}
I write a query like {name: (university of toronto)}...
Top two matches are:
"docs":
{
"id": "91348",
"p_code": "71637",
"name": "University of Toronto - Mississauga",
"ext_name": "",
"city": "Mississauga",
"country": "CA",
"state": "ON",
"type": "academic/campus",
"alt_name": "",
"ext_city": "",
"zip": "L5L 1C6",
"alt_ext_city": "",
"_version_": 1473710223400108000,
"score": 1.499069
},
{
"id": "10624",
"p_code": "7938",
"name": "University of Toronto",
"ext_name": "",
"city": "Toronto",
"country": "CA",
"state": "ON",
"type": "academic",
"alt_name": "Saint George Downtown Campus",
"ext_city": "",
"zip": "M5S 1A1",
"alt_ext_city": "",
"_version_": 1473710220148473900,
"score": 1.4967358
}
I am really surprised to see that "University of Toronto - Mississauga" returns a higher score than "university of Toronto". Intuitively, the field containing "University of Toronto - Mississauga" should get a lower score since it is longer than the other one.
I was also very surprised to see that Solr gives different values for querynorm as follows: (0.03198291 = queryNorm) for the top document and (0.03203078 = queryNorm) for the second ranked document. I presumed that the query norm should be exactly the same for the all documents as it is only a function of the query.
I am not sure if I got something wrong about how Solr works or there is something wrong in indexing or configuration? Has anybody faced the same problem?
omitNorms=true
. Favoring shorter fields when scoring, as you've mentioned, relies on having norms stored. – femtoRgon