4
votes

I'm trying to solve a performance issue we have when querying ElasticSearch for several thousand results. The basic idea is that we do some post-query processing and only show the Top X results ( Query may have ~100000 Results while we only need the top 100 according to our Score Mechanics ).

The basic mechanics are as follows: ElasticSearch Score is normalized between 0..1 ( score/max(score) ), we add our ranking score ( also normalized between 0..1 ) and divide by 2.

What I'd like to do is move this logic into ElasticSearch using custom scoring ( or well, anything that works ): https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-script-score

The Problem I'm facing is that using Score Scripts / Score Functions I can't seem to find a way to do something like max(_score) to normalize the score between 0 and 1.

"script_score" : {
    "script" : "(_score / max(_score) + doc['some_normalized_field'].value)/2"
}

Any ideas are welcome.

3
Please explain your logic here. Perhaps this can be done in pure elasticsearch without any scripting.Evaldas Buinauskas
Did you get an answer to this problem ? I am kind of stuck in the same from past some time.Ramandeep Singh
Hello! Did you get an answer? I am trying to figure out what I have to do with that!christouandr7

3 Answers

2
votes

You can not get max_score before you have actually generated the _score for all the matching documents. script_score query will first generate the _score for all the matching documents and then max_score will be displayed by elasticsearch.

According to what i can understand from your problem, You want to preserve the max_score that was generated by the original query, before you applied "script_score". You can get the required result if you do some computation at the front-end. In short apply your formula at the front end and then sort the results.

you can save your factor inside your results using script_fields query.

{
  "explain": true, 
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "total_goals": {
      "script": {
        "lang": "painless",
        "source": """
          int total = 0;
          for (int i = 0; i < doc['goals'].length; ++i) {
            total += doc['goals'][i];
          }
          return total;

        """,
        "params":{
          "last" : "any parameters required"
        }

      }
    }
  }
}

0
votes

I am not sure that I understand your question. do you want to limit the amount of results?

are you tried?

{
    "from" : 0, "size" : 10,
    "query" : {
        "term" : { "name" : "dennis" }
    }
}

you can use sort to define sort order by default it will sorted by main query.

you can also use aggregations ( with or without function_score )

{
  "query": {
    "function_score": {
      "functions": [
        {
          "gauss": {
            "date": {
              "scale": "3d",
              "offset": "7d",
              "decay": 0.1
            }
          }
        },
        {
          "gauss": {
            "priority": {
              "origin": "0",
              "scale": "100"
            }
          }
        }
      ],
      "query": {
        "match" : { "body" : "dennis" }
      }
    }
  },
  "aggs": {

        "hits": {
          "top_hits": {
            "size": 10
          }
        }
      }
}
0
votes

Based on this github ticket it is simply impossible to normalize score and they suggest to use boolean similarity as a workaround.