0
votes

We have a function score query with about 50 functions. Each function has a filter and a script_score. We have given score mode as SUM.

Mappings:

 "keywords": {
        "type": "nested",
        "include_in_parent": true,
        "properties": {
          "id": {
            "type": "string",
            "index_name": "id",
            "analyzer": "standard"
          },
          "name": {
            "type": "string",
            "index_name": "name"
          },
          "score": {
            "type": "double",
            "index_name": "keywordScore"
          }
        }
      }

Example Query:

 {
  "query": {
    "bool": {
      "should": {
        "nested": {
          "query": {
            "function_score": {
              "functions": [
                {
                  "filter": {
                    "term": {
                      "keywords.id": "np14y9393"
                    }
                  },
                  "script_score": {
                    "script": {
                      "inline": "(doc['keyword.score'].value*log(0.138317))+100"
                    }
                  }
                },
                {
                  "filter": {
                    "term": {
                      "keywords.id": "ny6579591"
                    }
                  },
                  "script_score": {
                    "script": {
                      "inline": "(doc['keyword.score'].value*log(0.0631535))+100"
                    }
                  }
                }
              ],
              "score_mode": "sum",
              "boost_mode": "sum"
            }
          },
          "path": "keywords"
        }
      }
    }
  }
}

Issues:

  1. Formula in each script_score deals with probabilities ranging from 0 to 1. So the output of script_score will always be less than 1. Example : 0.00456. In this case Elasticsearch is ignoring the score coming from script_score. I added hundred to my script which returns 100.00456. In this case the scores are showing up in the final score. May be Elasticsearch has some precision of a cutoff because of which it is behaving this way.

  2. Eventhough SUM is specified as a Score mode, Elasticsearch is internally doing some average on that score. As I said before I will be having 50 functions in the query. If 10 keywords got matched, the score should be around 1000. But the resultant score is around 80. Then how is this score mode used? How to tell Elasticsearch not to normalize the score and use the one that I specified?

  3. Explain API is not of much use here. It is not telling what is the score at each function level and how it is manipulating.

1

1 Answers

0
votes

Let's assume you have a set of 5 documents in your index and when you run the query, it shall be running on each doc one by one. Let's dry-run the query on the first document indexed.

The final _score for the first doc will be:

_score = es_score ([0-1]) + function_score;

es_score lies between 0 to 1 inclusive.

Considering the fact that all of your 50 functions are based on keywords.id filter and script_score for each function is almost the same and assuming x number of function filters matched:

_score = es_score + function_score(func1) + .... + function_score(funcx);
_score = es_score + [(doc['keyword.score'].value*log(0.138317))+100] + .... + [(doc['keyword.score'].value*log(0.138317))+100];

_score = es_score + [-value1 + 100] + .... + [-valueX + 100];

So, it depends on the values of your computed logs (possibly negative whole numbers), what your _score value for the document will be.