0
votes

My issue is that when we do a first name search using a fuzzy search(with a distance of 2 characters on the first name) it doesn’t seem to bring back all possibilities.

QueryType is Full

QueryString - "FirstName:gra~2 AND (LastName: \"*****\" OR LastName: /.*\"*****\".*/)"

I'm using an exact match OR a contains on the lastname for this example, this will stay constant across the examples

Results:

If I search FirstName:gre~2 in an Azure Search query string we get back:

Greg
Gary
Gene

If I search FirstName:gra~2 we get back:

Gina
Gary

If I search FirstName:grag~2 we get back:

Greg
Gary

We know that azure fuzzy search uses the damerau-levenshtein distance and it seems like from “gra” both “gina” and “greg” would be 2 characters difference, yet only one is showing up. Also "grag" in theory should return "gina" as well

I'm wondering if anyone has an explanation for this since it seems inconsistent

I used this to verify the "distance" between the strings "gra" and "greg"&"gina"

http://fuzzy-string.com/Compare/

Here's the link to the azure documentation on Lucene Syntax

https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax

These are both of the field definitions

{
  "name": "FirstName",
  "type": "Edm.String",
  "searchable": true,
  "filterable": true,
  "retrievable": true,
  "sortable": true,
  "facetable": false,
  "key": false,
  "indexAnalyzer": null,
  "searchAnalyzer": null,
  "analyzer": "standard.lucene",
  "synonymMaps": []
},

{
  "name": "LastName",
  "type": "Edm.String",
  "searchable": true,
  "filterable": true,
  "retrievable": true,
  "sortable": true,
  "facetable": false,
  "key": false,
  "indexAnalyzer": null,
  "searchAnalyzer": null,
  "analyzer": "standard.lucene",
  "synonymMaps": []
}

**Results seem to be the same regardless of lastname being used or not

1

1 Answers

0
votes

I would also expect those terms to match your fuzzy query. Just to do a sanity check before we dig deeper, can you confirm what are your analyzer settings (both at query time and indexing time)? I just want to confirm all the terms you mentioned are actually tokenized and indexed exactly the way you expect them (and also if their casing gets normalized the way you would expect them). You can use the Analyze API (https://docs.microsoft.com/en-us/rest/api/searchservice/test-analyzer) to confirm how those terms are tokenized. You also mentioned your query includes an AND clause matching on another field (LastName), can you confirm that even without that second clause, the results on the FirstName are still not what you expect? I just want to make sure we eliminate all external factors outside of the actual edit distance algorithm.

Update: I tried it on my side using the default analyzers and without the LastName clause. searching for "gra~2" successfully return "Greg", "Gary" and "Gina". I get the same results when I search for "gre~2" (as you did). Searching for "grag~2" only returns "Greg" and "Gary". "Gina" is not returned, but to me that seems expected (edit distance seems to be 3).