I have a custom analyzer on a field in my index. The analyzer uses a PatternTokenizer to keep hyphenated words together. And it uses a LowercaseTokenFilter (@odata.type: #Microsoft.Azure.Search.LowercaseTokenFilter) so the words are stored in the index in lowercase.
The analyzer is set so it is NOT a different indexanalyzer to searchanalyzer.
However when I search for the uppercase or mixed case term using the REST api I get no results. I only get results searching lowercase.
Since the analyzer is the same for search and index I would have expected results even when searching uppercase.
Of course its possible I haven't implemented the CustomAnalyzer correctly but using the Analyze endpoint it does seem to have tokenized in lowercase.
This is my json which I post to CreateIndex for custom analyzer
{
"@odata.context": "https://dev-xxx.search.windows.net/$metadata#indexes/$entity",
"@odata.etag": "\"0x8D5F638C546D690\"",
"name": "myproducts",
"fields": [
{
"name": "id",
"type": "Edm.String",
"searchable": false,
"filterable": false,
"retrievable": true,
"sortable": false,
"facetable": false,
"key": true,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null,
"synonymMaps": []
},
{
"name": "materialId",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "standard.lucene",
"synonymMaps": []
},
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "standard.lucene",
"synonymMaps": []
},
{
"name": "hyphenated",
"type": "Collection(Edm.String)",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": false,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "pdh",
"synonymMaps": []
}
],
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"suggesters": [],
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "xxx",
"tokenizer": "xxx",
"tokenFilters": [
"xxxlowercase"
],
"charFilters": []
}
],
"tokenizers": [
{
"@odata.type": "#Microsoft.Azure.Search.PatternTokenizer",
"name": "xxx",
"pattern": "([a-z])(?![\\w-])",
"flags": null,
"group": -1
}
],
"tokenFilters": [
{
"@odata.type": "#Microsoft.Azure.Search.LowercaseTokenFilter",
"name": "xxxlowercase"
}
],
"charFilters": []
}
so what am I doing wrong?
Obviously I can just control what is sent to the search endpoint by lowercasing everything before sending it but I thought it should work anyway.
thanks