1
votes

Can somebody tell me please why this Elastic query returns result below. The query has bool + must part which should match only if there is exact match in the filed nn with string "softo". Query looks like:

"query":{
        "bool":{
            "must":[
                {"match":{"nn":"softo"}}
            ],
            "should":[
                {"match":{"nn":"sro"}},
                {"match":{"nn":"as"}},
                {"match":{"nn":"no"}},
                {"match":{"nn":"vos"}},
                {"match":{"nn":"ks"}}
            ]
        }
    }

and it returns me a result where is no soft in nn field like:

            {
                "_index": "search_2",
                "_type": "doc",
                "_id": "17053188",
                "_score": 129.76167,
                "_source": {
                    "nn": "zo soz kovo zts nova as zts elektronika as",
                    "nazov": "ZO SOZ KOVO,ZŤS NOVA a.s.,ZTS ELEKTRONIKA a.s.",
                }
            },
            {
                "_index": "search_2",
                "_type": "doc",
                "_id": "45732078",
                "_score": 126.953285,
                "_source": {
                    "nn": "agentura socialnych sluzieb   ass no",
                    "nazov": "Agentúra sociálnych služieb - ASS n.o.",
                }
            }

I dont understand it. Why it returns result like "zo soz kovo zts nova as zts elektronika as" where is no "softo" string in it. Mapping for nn field looks like:

{
    "search_2": {
        "aliases": {
            "search": {}
        },
        "mappings": {
            "doc": {
                "dynamic": "strict",
                "properties": { 
                    "nn": {
                        "type": "text",
                        "boost": 10,
                        "analyzer": "autocomplete"
                    }
                }
            }
        },
        "settings": {
            "index": {
                "refresh_interval": "-1",
                "number_of_shards": "4",
                "provided_name": "search_2",
                "creation_date": "1539693645683",
                "analysis": {
                    "filter": {
                        "synonym_filter": {
                            "ignore_case": "true",
                            "type": "synonym",
                            "synonyms_path": "synonyms/sk_SK.txt"
                        },
                        "lemmagen_filter_sk": {
                            "type": "lemmagen",
                            "lexicon": "sk"
                        },
                        "stopwords_SK": {
                            "ignore_case": "true",
                            "type": "stop",
                            "stopwords_path": "stopwords/slovak.txt"
                        },
                        "remove_duplicities": {
                            "type": "unique",
                            "only_on_same_position": "true"
                        },
                        "autocomplete_filter": {
                            "type": "edge_ngram",
                            "min_gram": "2",
                            "max_gram": "20"
                        }
                    },
                    "analyzer": {
                        "autocomplete": {
                            "filter": [
                                "stopwords_SK",
                                "lowercase",
                                "stopwords_SK",
                                "autocomplete_filter"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "lower_ascii": {
                            "filter": [
                                "lowercase",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "suggestion": {
                            "filter": [
                                "stopwords_SK",
                                "lowercase",
                                "stopwords_SK",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "eyxXza0pQxWeQCpXih8ngg",
                "version": {
                    "created": "6020399"
                }
            }
        }
    }
}
2
Worse case you can get ElasticSearch to explain what it's matching - apokryfos

2 Answers

4
votes

The reason you are getting those results because of autocomplete analyzer applied on nn field. I'll explain based on following field:

"nn": "zo soz kovo zts nova as zts elektronika as"

Tokens generated for above will be:

zo, so, soz, ko, kov, kovo, zt, zts, no, nov, nova, as, zt, zts, el, ele, elek, elekt, elektr, elektro, elektro, elektroni, elektronik, elektronika, as

Now match query by default apply the same analyzer to search and default operator between tokens is OR. So {"match":{"nn":"softo"}} actually behaves as

{
  "match": {
    "nn": "so OR sof OR soft OR softo"
  }
}

As you can see for the field nn one of the token generated was so ans hence it get matched.

1
votes
  1. You can change "match" to "term" in your must query.

    When "match" query is called, score of the field will be computed. So the query will answer question "how well this string match".

    When "term" query is called no score will be computed. So the query will answer simple question: yes or no (it matches or not).


  1. If you really need full text search you can keep "match" in your "must" query and boost its score.

    For example if you want to boost its value by 5, it would look like this:

    "must":[
        {"match": {"nn": {"boost": 5, "query": "softo"}}}
    ]