Elasticsearch NGram Analyser - Change the Order of the results of Query

Question

Elasticsearch Query change display results according to the scoring

The current Query gives the result of the Field title in the following order.

Quick 123
Foxes Quick
Quick
Foxes Quick Quick
Quick Foxes

Shouldn't 3. Quick be coming as a first result instead?

Also , Foxes Quick Quick has two occurances of Quick, it should have some preference in the Queried result . But it is coming at 4th poistion .

Index Settings .

 {
 "fundraisers": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "fundraisers",
        "creation_date": "1546515635025",
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "my_tokenizer"
            },
            "search_analyzer_search": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "search_tokenizer_search"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "3",
              "type": "edge_ngram",
              "max_gram": "50"
            },
            "search_tokenizer_search": {
              "token_chars": [
                "letter",
                "digit",
                "whitespace"
              ],
              "min_gram": "3",
              "type": "ngram",
              "max_gram": "50"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "mVweO4_sT3Ww00MzdLyavw",
        "version": {
          "created": "6020399"
        }
      }
    }
  }
}

Query 

GET fundraisers/_search?explain=true

{
  "query": {
    "match_phrase": {
      "title": {
        "query": "qui",
        "analyzer": "my_analyzer"
        }
    }
  }
}
Mapping
{
  "fundraisers": {
    "mappings": {
      "fundraisers": {
        "properties": {
          "status": {
            "type": "text"
          },
          "suggest": {
            "type": "completion",
            "analyzer": "simple",
            "preserve_separators": true,
            "preserve_position_increments": true,
            "max_input_length": 50
          },
          "title": {
            "type": "text",
            "analyzer": "my_analyzer"
          },
          "twitterUrl": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "videoLinks": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "zipCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

Am I complicating this too much by using match_phrase,search analyzer and ngrams or is there any simpler way to achieve the expected result ?

Ref: https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-match-query.html

To understand the scoring, add "explain": true to the query dsl. — Nishant
Thanks @NishantSaini.I already added that in the Query . Its showing the response of explain . How do we alter the order of the response ? — Shiva MSK
@ShivaMSK , could you provide o/p of explain API ? I can't see it in your question — user156327
@AmitKhandelwal The explain api computes a score explanation for a query and a specific document. elastic.co/guide/en/elasticsearch/reference/current/… — Shiva MSK
@ShivaMSK I know that I want to know the output it produces in ur case. — user156327

xeraa xeraa · Accepted Answer · 2019-01-12T19:23:43

Ok, first let's create a minimal and reproducible setup:

PUT test
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1",
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "my_tokenizer"
          },
          "search_analyzer_search": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "search_tokenizer_search"
          }
        },
        "tokenizer": {
          "my_tokenizer": {
            "token_chars": [
              "letter",
              "digit"
            ],
            "min_gram": "3",
            "type": "edge_ngram",
            "max_gram": "50"
          },
          "search_tokenizer_search": {
            "token_chars": [
              "letter",
              "digit",
              "whitespace"
            ],
            "min_gram": "3",
            "type": "ngram",
            "max_gram": "50"
          }
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "title": "Quick 123"
}
PUT test/_doc/2
{
  "title": "Foxes Quick"
}
PUT test/_doc/3
{
  "title": "Quick"
}
PUT test/_doc/4
{
  "title": "Foxes Quick Quick"
}
PUT test/_doc/5
{
  "title": "Quick Foxes"
}

Then let's try the simplest query:

GET test/_search
{
  "query": {
    "match": {
      "title": {
        "query": "qui"
        }
    }
  }
}

And now your order is:

Quick
Foxes Quick Quick
Quick 123
Foxes Quick
Quick Foxes

That's pretty much what you were expecting, right? There might be other usecases, which are not covered by this query, but IMO you'll have to use multi_match and search on different analyzers, because I'm not sure a phrase_search on an edgegram makes much sense.

Elasticsearch NGram Analyser - Change the Order of the results of Query

1 Answers