0
votes

I'm looking for a way to fuzzy partial match against a field where the words match, however I want to also add in strict phrase matching.

i.e. say I have fields such as

foo bar
bar foo

I would like to achieve the following search behaviour:

  • If I search foo, I would like to return back both results.

  • If I search ba, I would like to return back both results.

  • If I search bar foo, I would like to only return back one result.

  • If I search bar foo foo, I don't want to return any results.

I would also like to add in single character fuzziness matching, so if a foo is mistyped as fbo then it would return back both results.

My current search and index analyzer uses an edge_gram tokenizer and is working fairly well, except if any gram matches, it will return the results regardless if the following words match. i.e. my search would return the back the following result for the search bar foo buzz

foo bar
bar foo

My tokenzier:

ngram_tokenizer: {
   type: "edge_ngram",
   min_gram: "2",
   max_gram: "15",
   token_chars: ['letter', 'digit', 'punctuation', 'symbol'],
},
          

My analyzer:

nGram_analyzer: {
  filter: [
  lowercase,
    "asciifolding"
  ],
  type: "custom",
  tokenizer: "ngram_tokenizer"
},

My field mapping:


type: "search_as_you_type",
doc_values: false,
max_shingle_size: 3,
analyzer: "nGram_analyzer"
          
1

1 Answers

1
votes

One way to achieve all your requirements is to use span_near query

Span near query are much longer, but these are suitable for doing phrase match along with fuzziness parameter

Adding a working example with index data, search queries and search results

Index Mapping:

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      }
    }
  }
}

Index Data:

{
    "title":"bar foo"
}
{
    "title":"foo bar"
}

Search Queries:

If I search foo, I would like to return back both results.

{
  "query": {
    "bool": {
      "must": [
        {
          "span_near": {
            "clauses": [
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "foo",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              }
            ],
            "slop": 0,
            "in_order": true
          }
        }
      ]
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "67205552",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.18232156,
        "_source": {
          "title": "bar foo"
        }
      },
      {
        "_index": "67205552",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.18232156,
        "_source": {
          "title": "foo bar"
        }
      }
    ]

If I search ba, I would like to return back both results.

{
  "query": {
    "bool": {
      "must": [
        {
          "span_near": {
            "clauses": [
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "ba",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              }
            ],
            "slop": 0,
            "in_order": true
          }
        }
      ]
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "67205552",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.18232156,
        "_source": {
          "title": "bar foo"
        }
      },
      {
        "_index": "67205552",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.18232156,
        "_source": {
          "title": "foo bar"
        }
      }
    ]

If I search bar foo foo, I don't want to return any results.

{
  "query": {
    "bool": {
      "must": [
        {
          "span_near": {
            "clauses": [
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "bar",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              },
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "foo",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              },
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "foo",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              }
            ],
            "slop": 0,
            "in_order": true
          }
        }
      ]
    }
  }
}

Search Result will be empty