0
votes

In my azure cognitive search index, when I search for the term "education", I get 660 hits. When I search for the term "educational", I also get 660 hits. Both seem to return the same results containing both variations of the word alongside one another.

However, I am seeing very strange behavior when using the wildcard character:

edu* returns 660 results (expected)
educ* returns 660 results (expected)
educa* returns 2 results (matches two instances of the hyphenated word "educa-tion")
educat* returns 0 results (unexpected)
educati* returns 0 results (unexpected)
educatio* returns 0 results (unexpected)

Every search field uses the English Lucene language analyzer and queryType is set to "full" and searchMode is set to "all".

Why aren't the last results returning anything?

As an aside, I found conflicting information about using the wildcard character at the beginning of a word.

The lucene documentation says:

Note: You cannot use a * or ? symbol as the first character of a search.

From: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

But on Microsoft's site, they seem to imply that it should work:

Term fragment comes after * or ?, with a forward slash to delimit the construct. For example, search=/.*numeric./ returns "alphanumeric".

From: https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax#bkmk_wildcard

I've tried *ducation (which returns an error) and /.*ducation./ (which returns 0 results).

Thank you for your help.