2
votes

Elasticsearch Query change display results according to the scoring

The current Query gives the result of the Field title in the following order.

  1. Quick 123
  2. Foxes Quick
  3. Quick
  4. Foxes Quick Quick
  5. Quick Foxes

Shouldn't 3. Quick be coming as a first result instead?

Also , Foxes Quick Quick has two occurances of Quick, it should have some preference in the Queried result . But it is coming at 4th poistion .

Index Settings .

 {
 "fundraisers": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "fundraisers",
        "creation_date": "1546515635025",
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "my_tokenizer"
            },
            "search_analyzer_search": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "search_tokenizer_search"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "3",
              "type": "edge_ngram",
              "max_gram": "50"
            },
            "search_tokenizer_search": {
              "token_chars": [
                "letter",
                "digit",
                "whitespace"
              ],
              "min_gram": "3",
              "type": "ngram",
              "max_gram": "50"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "mVweO4_sT3Ww00MzdLyavw",
        "version": {
          "created": "6020399"
        }
      }
    }
  }
}

Query 

GET fundraisers/_search?explain=true

{
  "query": {
    "match_phrase": {
      "title": {
        "query": "qui",
        "analyzer": "my_analyzer"
        }
    }
  }
}
Mapping
{
  "fundraisers": {
    "mappings": {
      "fundraisers": {
        "properties": {
          "status": {
            "type": "text"
          },
          "suggest": {
            "type": "completion",
            "analyzer": "simple",
            "preserve_separators": true,
            "preserve_position_increments": true,
            "max_input_length": 50
          },
          "title": {
            "type": "text",
            "analyzer": "my_analyzer"
          },
          "twitterUrl": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "videoLinks": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "zipCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

Am I complicating this too much by using match_phrase,search analyzer and ngrams or is there any simpler way to achieve the expected result ?

Ref: https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-match-query.html

1
To understand the scoring, add "explain": true to the query dsl.Nishant
Thanks @NishantSaini.I already added that in the Query . Its showing the response of explain . How do we alter the order of the response ?Shiva MSK
@ShivaMSK , could you provide o/p of explain API ? I can't see it in your questionuser156327
@AmitKhandelwal The explain api computes a score explanation for a query and a specific document. elastic.co/guide/en/elasticsearch/reference/current/…Shiva MSK
@ShivaMSK I know that I want to know the output it produces in ur case.user156327

1 Answers

0
votes

Ok, first let's create a minimal and reproducible setup:

PUT test
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1",
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "my_tokenizer"
          },
          "search_analyzer_search": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "search_tokenizer_search"
          }
        },
        "tokenizer": {
          "my_tokenizer": {
            "token_chars": [
              "letter",
              "digit"
            ],
            "min_gram": "3",
            "type": "edge_ngram",
            "max_gram": "50"
          },
          "search_tokenizer_search": {
            "token_chars": [
              "letter",
              "digit",
              "whitespace"
            ],
            "min_gram": "3",
            "type": "ngram",
            "max_gram": "50"
          }
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "title": "Quick 123"
}
PUT test/_doc/2
{
  "title": "Foxes Quick"
}
PUT test/_doc/3
{
  "title": "Quick"
}
PUT test/_doc/4
{
  "title": "Foxes Quick Quick"
}
PUT test/_doc/5
{
  "title": "Quick Foxes"
}

Then let's try the simplest query:

GET test/_search
{
  "query": {
    "match": {
      "title": {
        "query": "qui"
        }
    }
  }
}

And now your order is:

  1. Quick
  2. Foxes Quick Quick
  3. Quick 123
  4. Foxes Quick
  5. Quick Foxes

That's pretty much what you were expecting, right? There might be other usecases, which are not covered by this query, but IMO you'll have to use multi_match and search on different analyzers, because I'm not sure a phrase_search on an edgegram makes much sense.