How to handle unordered multi-word query in Elasticsearch?

Question

I have the following situation:

The simple analyzer processes the text "The brown and green fox are quick" and adds the individual lower case terms to the index.

I want to use the following query phrase against my indices: "quick brown f"

I use the match_phrase_prefix in order to run this search:

{
    "query": {
        "match_phrase_prefix" : {
            "message" : {
                "query" : "quick brown f",
                "max_expansions" : 10
            }
        }
    } 
}

Unfortunately no results are returned since the order of the terms does not match up with the query terms. I will get results back if I use a match query and if I use the complete terms. It seems that match_phrase_prefix is checking the order:

This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown).

My question:

Is there a way to run a query which does handle incomplete terms and returns results regardless of the order of the terms in the source document? The only option I can currently think of is to manually create a query for each term in the input query (e.g.: quick, brown, f) and combine them using a bool query.

LaserJesus LaserJesus · Accepted Answer · 2017-11-20T21:42:39

The edge_ngram analyzer should do what you want. If you set it up with a min_gram value set to 1 and the max gram value set to 10 the document would have the necessary tokens stored. Then you can apply the standard analyzer to your query text and match it against the edge_ngram document field.

The example in the documentation is almost exactly the same as your requested solution. Note the use of the explicit and operator in the query to make sure all of your search tokens, partial or otherwise, are matched.

From the documentation for 5.6:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

PUT my_index/doc/1
{
  "title": "Quick Foxes" 
}

POST my_index/_refresh

GET my_index/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Quick Fo", 
        "operator": "and"
      }
    }
  }
}

How to handle unordered multi-word query in Elasticsearch?

1 Answers