2
votes

Searching for partial part of the phrase returns results in a strange order, for example, giving this two documents

{
            "@search.score": 0.5696786,
            "Guid": "ce73ca06-f170-46df-b0ef-a6e6e72b76ce",
            "FirstName": "Ruy",
            "LastName": "Bssaf",
            "Phone": "560523791699",
            "CustomerId": "-1",
            "CustomerEmail": "guy@twingocoil",
            "MySuperpharm": "True"
        },
        {
            "@search.score": 0.5619051,
            "Guid": "090c623f-5993-458e-93cc-8ef3d885eb29",
            "FirstName": "ruy",
            "LastName": "reffen",
            "Phone": "0522545833",
            "CustomerId": "76016443160",
            "CustomerEmail": "guy@geffenmedicalcom",
            "MySuperpharm": "False"
        }, 

and searching for "guy@twingoco" will return the second doc before the first one, although clearly one would expect to see the first one first, which have the "CustomerEmail" field almost identical to the phrase term.

The search is done inside the portal, no extra parameters except for the search term. When searching the full email the expected result does come first.

Please do not refer to this specific case of "email phrase", I'm asking in general how to make the search take in account also partial phrase.

1
What's your search query, and how did you set up the documents inside Azure search? Did you mark customer email as a search field?PartlyCloudy
I'm adding more information to answer your questionGuy Assaf
Hi Guy, what you're seeing is not expected. Can you share your exact search request and the response you're seeing with search scores (in the example you shared the order is correct). You can see how the indexed content and the search term get tokenized using the Analyze API (docs.microsoft.com/en-us/rest/api/searchservice/test-analyzer). In your case the email addresses are split at the @ sign both at indexing and query time, so your search query becomes: guy twingocoYahnoosh

1 Answers

3
votes

This issue has to do with how Lucene handles email addresses. Azure search uses the Lucene analyzer as its default analyzer: https://lucene.apache.org/core/5_2_0/core/org/apache/lucene/analysis/Analyzer.html

The standard Lucene analyzer looks at emails as a single token, that is why the partial search will not create a hit for you. (Similarly to if you search for "car" you will not get a hit for "careful" even though it is a prefix). More on this issue is explained here: Querying email addresses indexed by lucene

The good news is that you can create a custom tokanizer that will help you address this issue: Check the accepted answer Using Lucene to search for email addresses to see an approach how to implement such a tokenizer, and see this documentation by Azure search to see how to use custom analyzers: https://azure.microsoft.com/en-gb/blog/custom-analyzers-in-azure-search

Good luck!