Elasticsearch search bool + must query

Question

Can somebody tell me please why this Elastic query returns result below. The query has bool + must part which should match only if there is exact match in the filed nn with string "softo". Query looks like:

"query":{
        "bool":{
            "must":[
                {"match":{"nn":"softo"}}
            ],
            "should":[
                {"match":{"nn":"sro"}},
                {"match":{"nn":"as"}},
                {"match":{"nn":"no"}},
                {"match":{"nn":"vos"}},
                {"match":{"nn":"ks"}}
            ]
        }
    }

and it returns me a result where is no soft in nn field like:

            {
                "_index": "search_2",
                "_type": "doc",
                "_id": "17053188",
                "_score": 129.76167,
                "_source": {
                    "nn": "zo soz kovo zts nova as zts elektronika as",
                    "nazov": "ZO SOZ KOVO,ZŤS NOVA a.s.,ZTS ELEKTRONIKA a.s.",
                }
            },
            {
                "_index": "search_2",
                "_type": "doc",
                "_id": "45732078",
                "_score": 126.953285,
                "_source": {
                    "nn": "agentura socialnych sluzieb   ass no",
                    "nazov": "Agentúra sociálnych služieb - ASS n.o.",
                }
            }

I dont understand it. Why it returns result like "zo soz kovo zts nova as zts elektronika as" where is no "softo" string in it. Mapping for nn field looks like:

{
    "search_2": {
        "aliases": {
            "search": {}
        },
        "mappings": {
            "doc": {
                "dynamic": "strict",
                "properties": { 
                    "nn": {
                        "type": "text",
                        "boost": 10,
                        "analyzer": "autocomplete"
                    }
                }
            }
        },
        "settings": {
            "index": {
                "refresh_interval": "-1",
                "number_of_shards": "4",
                "provided_name": "search_2",
                "creation_date": "1539693645683",
                "analysis": {
                    "filter": {
                        "synonym_filter": {
                            "ignore_case": "true",
                            "type": "synonym",
                            "synonyms_path": "synonyms/sk_SK.txt"
                        },
                        "lemmagen_filter_sk": {
                            "type": "lemmagen",
                            "lexicon": "sk"
                        },
                        "stopwords_SK": {
                            "ignore_case": "true",
                            "type": "stop",
                            "stopwords_path": "stopwords/slovak.txt"
                        },
                        "remove_duplicities": {
                            "type": "unique",
                            "only_on_same_position": "true"
                        },
                        "autocomplete_filter": {
                            "type": "edge_ngram",
                            "min_gram": "2",
                            "max_gram": "20"
                        }
                    },
                    "analyzer": {
                        "autocomplete": {
                            "filter": [
                                "stopwords_SK",
                                "lowercase",
                                "stopwords_SK",
                                "autocomplete_filter"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "lower_ascii": {
                            "filter": [
                                "lowercase",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "suggestion": {
                            "filter": [
                                "stopwords_SK",
                                "lowercase",
                                "stopwords_SK",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "eyxXza0pQxWeQCpXih8ngg",
                "version": {
                    "created": "6020399"
                }
            }
        }
    }
}

Worse case you can get ElasticSearch to explain what it's matching — apokryfos

Nishant Nishant · Accepted Answer · 2018-12-03T17:03:09

The reason you are getting those results because of autocomplete analyzer applied on nn field. I'll explain based on following field:

"nn": "zo soz kovo zts nova as zts elektronika as"

Tokens generated for above will be:

zo, so, soz, ko, kov, kovo, zt, zts, no, nov, nova, as, zt, zts, el, ele, elek, elekt, elektr, elektro, elektro, elektroni, elektronik, elektronika, as

Now match query by default apply the same analyzer to search and default operator between tokens is OR. So {"match":{"nn":"softo"}} actually behaves as

{
  "match": {
    "nn": "so OR sof OR soft OR softo"
  }
}

As you can see for the field nn one of the token generated was so ans hence it get matched.

Elasticsearch search bool + must query

2 Answers