How to match terms with spaces in elasticsearch?

Question

I have a content field (string) indexed in elasticsearch. The analyzer is default one - standard analyzer.

When I use match query to search:

{"query":{"match":{"content":"micro soft", "operator":"and"}}}

Result shows it can't match "microsoft".

Then how to use input keyword "micro soft" to match the document content contains "microsoft"?

Lee H Lee H · Accepted Answer · 2015-04-17T21:26:03

Another solution to this is to use the nGram token filter, which would allow you to have a more "fuzzy" match.

Using your example for "microsoft" and "micro soft", here is an example of how an ngram token filter would break down the tokens:

POST /test
{
  "settings": {
    "analysis": {
      "filter": {
        "my_ngrams": {
          "type": "ngram",
          "min_gram": "3",
          "max_gram": "5"
        }
      },
      "analyzer" : {
        "my_analyzer" : {
          "type" : "custom",
          "tokenizer" : "standard",
          "filter": ["my_ngrams"]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "body": {
          "type": "string",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

And analyzing the two things:

curl '0:9200/test/_analyze?field=body&pretty' -d'microsoft'
{
  "tokens" : [ {
    "token" : "mic"
  }, {
    "token" : "micr"
  }, {
    "token" : "micro"
  }, {
    "token" : "icr"
  }, {
    "token" : "icro"
  }, {
    "token" : "icros"
  }, {
    "token" : "cro"
  }, {
    "token" : "cros"
  }, {
    "token" : "croso"
  }, {
    "token" : "ros"
  }, {
    "token" : "roso"
  }, {
    "token" : "rosof"
  }, {
    "token" : "oso"
  }, {
    "token" : "osof"
  }, {
    "token" : "osoft"
  }, {
    "token" : "sof"
  }, {
    "token" : "soft"
  }, {
    "token" : "oft"
  } ]
}

curl '0:9200/test/_analyze?field=body&pretty' -d'micro soft'
{
  "tokens" : [ {
    "token" : "mic"
  }, {
    "token" : "micr"
  }, {
    "token" : "micro"
  }, {
    "token" : "icr"
  }, {
    "token" : "icro"
  }, {
    "token" : "cro"
  }, {
    "token" : "sof"
  }, {
    "token" : "soft"
  }, {
    "token" : "oft"
  } ]
}

(I cut out some of the output, full output here: https://gist.github.com/dakrone/10abb4a0cfe8ce8636ad)

As you can see, since the ngram terms for "microsoft" and "micro soft" overlap, you will be able to find matches for searches like this.

How to match terms with spaces in elasticsearch?

3 Answers