1
votes

I have a custom analyzer on a field in my index. The analyzer uses a PatternTokenizer to keep hyphenated words together. And it uses a LowercaseTokenFilter (@odata.type: #Microsoft.Azure.Search.LowercaseTokenFilter) so the words are stored in the index in lowercase.

The analyzer is set so it is NOT a different indexanalyzer to searchanalyzer.

However when I search for the uppercase or mixed case term using the REST api I get no results. I only get results searching lowercase.

Since the analyzer is the same for search and index I would have expected results even when searching uppercase.

Of course its possible I haven't implemented the CustomAnalyzer correctly but using the Analyze endpoint it does seem to have tokenized in lowercase.

This is my json which I post to CreateIndex for custom analyzer

{
"@odata.context": "https://dev-xxx.search.windows.net/$metadata#indexes/$entity",
"@odata.etag": "\"0x8D5F638C546D690\"",
"name": "myproducts",
"fields": [
    {
        "name": "id",
        "type": "Edm.String",
        "searchable": false,
        "filterable": false,
        "retrievable": true,
        "sortable": false,
        "facetable": false,
        "key": true,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": null,
        "synonymMaps": []
    },
    {
        "name": "materialId",
        "type": "Edm.String",
        "searchable": true,
        "filterable": true,
        "retrievable": true,
        "sortable": true,
        "facetable": true,
        "key": false,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": "standard.lucene",
        "synonymMaps": []
    },
    {
        "name": "name",
        "type": "Edm.String",
        "searchable": true,
        "filterable": true,
        "retrievable": true,
        "sortable": true,
        "facetable": true,
        "key": false,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": "standard.lucene",
        "synonymMaps": []
    },
    {
        "name": "hyphenated",
        "type": "Collection(Edm.String)",
        "searchable": true,
        "filterable": true,
        "retrievable": true,
        "sortable": false,
        "facetable": true,
        "key": false,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": "pdh",
        "synonymMaps": []
    }
],
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"suggesters": [],
"analyzers": [
    {
        "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
        "name": "xxx",
        "tokenizer": "xxx",
        "tokenFilters": [
            "xxxlowercase"
        ],
        "charFilters": []
    }
],
"tokenizers": [
    {
        "@odata.type": "#Microsoft.Azure.Search.PatternTokenizer",
        "name": "xxx",
        "pattern": "([a-z])(?![\\w-])",
        "flags": null,
        "group": -1
    }
],
"tokenFilters": [
    {
        "@odata.type": "#Microsoft.Azure.Search.LowercaseTokenFilter",
        "name": "xxxlowercase"
    }
],
"charFilters": []
}

so what am I doing wrong?

Obviously I can just control what is sent to the search endpoint by lowercasing everything before sending it but I thought it should work anyway.

thanks

1

1 Answers

2
votes

It looks the field "hyphenated" points to a custom analyzer named "pdh". I expect this is a a typo in your example since create index should fail with this configuration since "pdh" doesn't exist. Please confirm.

In most cases, Azure Search runs the analyzers on search terms at query time. The most notable exception to this is when there is a wildcard in a search term and it is impossible to analyze. As a result, if you normalize the data to lower case using an analyzer, the term should also be lower cased as well at query time. You can use the Analyze API with your custom analyzer to see how a term would be processed. More info: https://docs.microsoft.com/en-us/rest/api/searchservice/test-analyzer

If you can share, I'm curious what you are using for your test query term and what you have in the index.

Hope this helps.

Mike