0
votes

I am using Solr for searching institutions... My Solr DB has around 400k documents each of which has multiple fields like ("name","id","city",...)...

A document in my DB looks like this:

"docs": 
{
    "id": "91348",
    "p_code": "71637",
    "name": "University of Toronto - Mississauga",
    "ext_name": "",
    "city": "Mississauga",
    "country": "CA",
    "state": "ON",
    "type": "academic/campus",
    "alt_name": "",
    "ext_city": "",
    "zip": "L5L 1C6",
    "alt_ext_city": "",
}

I write a query like {name: (university of toronto)}... Top two matches are:

"docs": 
{
    "id": "91348",
    "p_code": "71637",
    "name": "University of Toronto - Mississauga",
    "ext_name": "",
    "city": "Mississauga",
    "country": "CA",
    "state": "ON",
    "type": "academic/campus",
    "alt_name": "",
    "ext_city": "",
    "zip": "L5L 1C6",
    "alt_ext_city": "",
    "_version_": 1473710223400108000,
    "score": 1.499069
},

{
    "id": "10624",
    "p_code": "7938",
    "name": "University of Toronto",
    "ext_name": "",
    "city": "Toronto",
    "country": "CA",
    "state": "ON",
    "type": "academic",
    "alt_name": "Saint George Downtown Campus",
    "ext_city": "",
    "zip": "M5S 1A1",
    "alt_ext_city": "",
    "_version_": 1473710220148473900,
    "score": 1.4967358
}

I am really surprised to see that "University of Toronto - Mississauga" returns a higher score than "university of Toronto". Intuitively, the field containing "University of Toronto - Mississauga" should get a lower score since it is longer than the other one.

I was also very surprised to see that Solr gives different values for querynorm as follows: (0.03198291 = queryNorm) for the top document and (0.03203078 = queryNorm) for the second ranked document. I presumed that the query norm should be exactly the same for the all documents as it is only a function of the query.

I am not sure if I got something wrong about how Solr works or there is something wrong in indexing or configuration? Has anybody faced the same problem?

1
How does your complete query string look? .. and are we talking a single server, or are there sharding or SolrCloud involvement?MatsLindh
As far as why the shorter term doesn't get a boost in score, my best guess is that you have omitNorms=true. Favoring shorter fields when scoring, as you've mentioned, relies on having norms stored.femtoRgon

1 Answers

0
votes

Make sure that omitNorms is set to false for that field and that your collection is using the latest version of the schema. Then re-index all of your documents for the change to the field to take effect.

I've found that some schema modifications are best treated with a complete wipe of the index prior to indexing in new content. I am not sure, but I believe this may be one of them. For most of the changes you can just re-index all of your content and overwrite the old stuff.