I have a fairly basic Azure Search index with several fields of searchable string data, for example [abridged]...
"fields": [
{
"name": "Field1",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Field2",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "en.microsoft",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
]
Field1
is loaded with alphanumeric id data and Field2
is loaded with English language string data, specifically the name/title of the record. searchMode=all
is also being used to ensure the accuracy of the results.
Let's say one of the records indexed has the following Field2
data: BA (Hons) in Business, Organisational Behaviour and Coaching
. Putting that into the en.microsoft
analyzer, this is the result we get out:
"tokens": [
{
"token": "ba",
"startOffset": 0,
"endOffset": 2,
"position": 0
},
{
"token": "hon",
"startOffset": 4,
"endOffset": 8,
"position": 1
},
{
"token": "hons",
"startOffset": 4,
"endOffset": 8,
"position": 1
},
{
"token": "business",
"startOffset": 13,
"endOffset": 21,
"position": 3
},
{
"token": "organizational",
"startOffset": 23,
"endOffset": 37,
"position": 4
},
{
"token": "organisational",
"startOffset": 23,
"endOffset": 37,
"position": 4
},
{
"token": "behavior",
"startOffset": 38,
"endOffset": 47,
"position": 5
},
{
"token": "behaviour",
"startOffset": 38,
"endOffset": 47,
"position": 5
},
{
"token": "coach",
"startOffset": 52,
"endOffset": 60,
"position": 7
},
{
"token": "coaching",
"startOffset": 52,
"endOffset": 60,
"position": 7
}
]
As you can see, the tokens returned are what you'd expect for such a string. However, when it comes to using that same indexed string value as a search term (sadly a valid user case in this instance), the results returned are not as expected unless you explicitly use searchFields=Field2
.
Query 1 (Returns 0 results):
?searchMode=all&search=BA%20(Hons)%20in%20Business%2C%20Organisational%20Behaviour%20and%20Coaching
Query 2 (Returns 0 results):
?searchMode=all&searchFields=Field1,Field2&search=BA%20(Hons)%20in%20Business%2C%20Organisational%20Behaviour%20and%20Coaching
Query 3 (Returns 1 result as expected):
?searchMode=all&searchFields=Field2&search=BA%20(Hons)%20in%20Business%2C%20Organisational%20Behaviour%20and%20Coaching
So why does this only return the expected result with searchFields=Field2
and not with no searchFields
defined or searchFields=Field1,Field2
? I would not expect a no match on Field1
to exclude a result that's clearly matching on Field2
?
Furthermore, removing the "in"
and "and"
within the search term seems to correct the issue and return the expected result. For example:
Query 4 (Returns 1 result as expected):
?searchMode=all&search=BA%20(Hons)%20Business%2C%20Organisational%20Behaviour%20Coaching
(This is almost like one analyzer is tokenizing the indexed data and a completely different analyzer is tokenizing the search term, although that theory doesn't make any sense when taking into consideration Query 3, as that provides a positive match using the exact same indexed data/search term.)
Is anybody able to shed some light as to what's going on here as I'm completely out of ideas and I can't find anything more in the documentation?
NB. Please bear in mind that I'm looking to understand why Azure Search is behaving in this way and not necessarily wanting a work around.