1
votes

Say you have two multi word phrases, one is "quick fox" and the other is "lazy brown".

The goal is to have 0 slop within the phrases and >0 slop between the phrases. Such that "quick fox" and "lazy brown" are matched without any extra words within either phrase but there could be multiple words between the two phrases.

So that the following should match:

  1. quick fox jumped over the lazy brown dog
  2. quick fox jumped 10 feet over and above the lazy brown dog
  3. quick fox jumped 10 feet over and above the lazy brown cat
  4. quick fox hopped over the lazy brown dog

But these will not:

  1. quick fast fox jumped over the lazy brown dog
  2. quick fox jumped over the lazy slow brown dog

Any ideas? I've been experimenting with span_near and span_multi but haven't gotten anywhere yet.

2

2 Answers

0
votes

What you are looking for is phrase query. Phrase query will make sure only if the words remain adjacent , then the match occurs. That is , it only tolerates a slop of 0.

{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "text": "quick fox"
          }
        },
        {
          "match_phrase": {
            "text": "lazy brown"
          }
        }
      ]
    }
  }
}

Phrase query - http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/phrase-matching.html

-2
votes

Disclaimer: I've never even heard of Elasticsearch until just now, and came up with this answer after 5 minutes on google.

It doesn't look like it's possible to specify a minimum slop value for a queried string (although I could have missed something), which makes your requirement: ">0 slop between the phrases" a little tricky. However, would this simple trick solve your problem:

{
    "bool": {
        "must":     { "match": "quick fox"},
        "must":     { "match": "lazy brown"},
        "must_not": { "match": "quick fox lazy brown"}
    }
}

Links: dsl string query, slop guide