0
votes

From https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html we have the following function for calculating the score.

score(q,d)  =  
            queryNorm(q)  
          · coord(q,d)    
          · ∑ (           
                tf(t in d)   
              · idf(t)²      
              · t.getBoost() 
              · norm(t,d)    
            ) (t in q) 

However when looking at example below the explanation there seem to be some inconsistencies. 1) Explanation shows only idf not idf².

2) Where is the coordination factor?

3) From the explanation, score seems to be calculated by: (tf * idf * fieldNorm) + (number of clauses * boost * queryNorm)

Indexed Doc:

PUT test/type/1
{
  "text": "a b c"
}

Query:

GET test/type/_search
{
  "explain":"true",
  "query": {
    "match": {
      "text": "a"
    }
  }
}

Result:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.15342641,
    "hits": [
      {
        "_shard": 3,
        "_node": "5QvbXVlRSku-p_g81ZXpjQ",
        "_index": "test",
        "_type": "type",
        "_id": "1",
        "_score": 0.15342641,
        "_source": {
          "text": "a b c"
        },
        "_explanation": {
          "value": 0.15342641,
          "description": "sum of:",
          "details": [
            {
              "value": 0.15342641,
              "description": "weight(text:a in 0) [PerFieldSimilarity], result of:",
              "details": [
                {
                  "value": 0.15342641,
                  "description": "fieldWeight in 0, product of:",
                  "details": [
                    {
                      "value": 1,
                      "description": "tf(freq=1.0), with freq of:",
                      "details": [
                        {
                          "value": 1,
                          "description": "termFreq=1.0",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 0.30685282,
                      "description": "idf(docFreq=1, maxDocs=1)",
                      "details": []
                    },
                    {
                      "value": 0.5,
                      "description": "fieldNorm(doc=0)",
                      "details": []
                    }
                  ]
                }
              ]
            },
            {
              "value": 0,
              "description": "match on required clause, product of:",
              "details": [
                {
                  "value": 0,
                  "description": "# clause",
                  "details": []
                },
                {
                  "value": 3.2588913,
                  "description": "_type:type, product of:",
                  "details": [
                    {
                      "value": 1,
                      "description": "boost",
                      "details": []
                    },
                    {
                      "value": 3.2588913,
                      "description": "queryNorm",
                      "details": []
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}
1
The coordination factor is the amount of terms from the query that appear in the returned document. For question 2, since the search is for only one term the coordination factor is 1 and hence not displayed. Should the query be for "a b x" then there would be a coordination factor of 2/3.user6811487

1 Answers

0
votes
  • You are missing one case of the idf, because you only have one clause in your query. The second multiplication of the idf comes in the query weight, which you won't see in such a simple query. The second idf is cancelled out by the querynorm. A Querynorm (simplifying a bit) is: 1 / √ (∑ idf^2), with a single term it becomes : 1 / idf, so the query weight becomes idf/idf. All of this is implicit though, with only one clause, there is nothing to weigh the term against, so the query weight doesn't need to be calculated.

  • There is only one term in this query, so no coord to consider. That is, coord = overlap / maxOverlap = 1/1 = 1

  • No idea where this is coming from. I believe you are getting thrown a bit an _type query. Appears to be a required term added to search against a given Elasticsearch type. Note that the score for this query is zeroed. So all matches will have to fit the specified _type, but the term shouldn't impact the score at all.

If you want to see everything at work in the scoring algorithm, you would need to use a test data set and query that are just a bit closer to realistic conditions. This test has a single simple document, and a single simple query. In this case, yes the algorithm looks as simple as:

score = tf * idf * fieldNorm = 1 * 0.30685282 * .5

But you aren't seeing the coord, query norm, or overall wuery weight calculation because your query is too simple. You aren't seeing a particularly meaningful idf (or tf) because there is only one document and one match. You aren't seeing the summation, because you have one hit against one term, so there is nothing to sum. The algorithm is primarily designed to produce meaningful scores from larger data sets.