match phrase query not working as expected

Question

Reading from elastic documentation:

the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.

I have configured my analyzer to use edge_ngram with keyword tokenizer :

{
        "index": {
            "number_of_shards": 1,
            "analysis": {
                "filter": {
                    "autocomplete_filter": {
                        "type": "edge_ngram",
                        "min_gram": 1,
                        "max_gram": 20
                    }
                },
                "analyzer": {
                    "autocomplete": {
                        "type": "custom",
                        "tokenizer": "keyword",
                        "filter": [
                            "lowercase",
                            "autocomplete_filter"
                        ]
                    }
                }
            }
        }
    }

Here is the java class that is used for indexing :

@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {


    @org.springframework.data.annotation.Id
    private Long instanceId;

    @Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
    private String name;
}

if I have the following phrase in document "hello world", the following query will match it :

{
  "match" : {
    "name" : {
      "query" : "ho",
      "type" : "phrase"
    }
  }
}
result : "hello world"

that's not what I expect because not all of the search terms in the document.

my questions :

1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )

2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)

update:

just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho . Now we will have 2 terms as this is edge_ngram with 1 min_gram. The 2 terms are h and ho . according to elasticsearch the document must contain all of the search terms. However hello world has h only and doesn't have ho so why I did get a match here ?

(1) You haven't added index mapping, neither you have specified the type for name field. (2) You haven't specified any sample doc, so we don't know against what data you are try to match. Clarify these points so that people here can help you better. — Nishant

xeraa xeraa · Accepted Answer · 2018-12-28T02:20:58

If you could provide complete, runnable examples for your problems it would make it much easier to help you. For example something like this:

PUT test
{
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "autocomplete"
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "name": "Hello world"
}

GET test/_search
{
  "query": {
    "match_phrase": {
      "name": "hello foo"
    }
  }
}

Judging from your search query, you are using Elasticsearch 2.x or earlier. This is a dead version — you should really upgrade.
I'm not sure phrase search on edge grams make much sense in combination. What are you trying to achieve here?
Why is it matching? Your search query is running through the same analyzer as your stored field. Since you have defined min_gram: 1, your ho will be searched as h and ho. The h matches the h from hello. match or match_phrase doesn't make a difference here with this analyzer.

match phrase query not working as expected

3 Answers