0
votes

We have stored documents into azure search. One of the document is having below field value.

"Title": "statistics_query.compute_shader_invocations.secondary_inherited fails"

We have defined custom analyzer on it as per the recommendation from MS Azure Team, in order to resolve one of the issue we were facing due to _ (underscore).

{
  "name": "myindex",
  "fields": [
        {
            "name": "id",
            "type": "Edm.String",
            "searchable": true,
            "filterable": true,
            "retrievable": true,
            "sortable": false,
            "facetable": false,
            "key": true,
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "analyzer": null
        },
        {
            "name": "Title",
            "type": "Edm.String",
            "searchable": true,
            "filterable": true,
            "retrievable": true,
            "sortable": true,
            "facetable": true,
            "key": false,
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "analyzer": "remove_underscore"
        }
],
  "analyzers": [
    {
      "name": "remove_underscore",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "charFilters": [
        "remove_underscore"
      ],
      "tokenizer": "standard_v2"
    }
  ],
  "charFilters": [
    {
      "name": "remove_underscore",
      "@odata.type": "#Microsoft.Azure.Search.MappingCharFilter",
      "mappings": [
        "_=>-"
      ]
    }
  ]
}

However, when I search with below Filters on my azure search index (version # 2016-09-01 Preview), i didnt get any result.

$filter=search.ismatch('"compute_shader_invocations*"','Title', 'full', 'any')

$filter=search.ismatch('"compute_shader_invocations"','Title', 'full', 'any')

$filter=search.ismatch('"shader_invocations*"','Title', 'full', 'any')

However, if I include the text with (.) dot character, the same filter works.

$filter=search.ismatch('"query.compute_shader*"','Title', 'full', 'any')

Based on my tests, if the document is having a dot (.) character present right after or before the search term used in the filters, then the search doesnt return result.

So, below filters wont work as there is a (.) dot character present in the document, right before and after the search terms used in the query. In our case there is a dot character present before word "compute" and after word "invocations" in the Azure Search Document.

$filter=search.ismatch('"compute_shader_invocations*"','Title', 'full', 'any')

$filter=search.ismatch('"compute_shader"','Title', 'full', 'any')

$filter=search.ismatch('"shader_invocations*"','Title', 'full', 'any')

However below filters should work, as there is no dot character present before the word "query" or after the word "shadder" in the Azure search document

$filter=search.ismatch('"query.compute_shader*"','Title', 'full', 'any') $filter=search.ismatch('"shader*"','Title', 'full', 'any')

This is driving me crazy. Any help would be highly appreciated.

1

1 Answers

2
votes

tl;dr Wildcard queries don't have custom analysis performed. Non wildcard queries should return results, so please double check

Detailed answer

So, the dot (.) actually doesn't have anything to do with the behavior you are observing. There are 2 classes of search queries you are issuing:

  1. A wildcard query *
  2. A non wildcard query (such as "compute_shader")

In general, a non wildcard query you issue, will undergo the same analysis as defined by any custom analyzer in your index. In case of wildcard queries, no analysis is performed.

Now taking your document text as an example "statistics_query.compute_shader_invocations.secondary_inherited fails", the custom analyzer you defined will break it down into tokens. (FYI: You can use the Analyze API to see the breakdown).

The following wildcard query succeeds

$filter=search.ismatch('"shader*"','Title', 'full', 'any')

because, when you run the analysis on the source document, there are tokens like "shader"

The following wildcard queries don't succeed

$filter=search.ismatch('"compute_shader_invocations*"','Title', 'full', 'any') $filter=search.ismatch('"shader_invocations*"','Title', 'full', 'any')

because there are no tokens like "computer_shader_invocations" or "shader_invocations" when the source document is analyzed with your custom analyzer.

This one shouldn't succeed as well, but interestingly you say that it does:

$filter=search.ismatch('"query.compute_shader*"','Title', 'full', 'any')

Let's focus now on queries without wildcards.

$filter=search.ismatch('"compute_shader_invocations"','Title', 'full', 'any') $filter=search.ismatch('"compute_shader"','Title', 'full', 'any')

These should technically get tokenized correctly using the custom analyzer and should have matching results.

Could you please verify whether your queries in the last 3 highlighted instances were correct in your original question? When I tried to create a sample index and issued a search request based on your configuration, those were the 3 anomalies I noticed. I would appreciate some clarification around those.

Also, in general the documentation around how full text search in Azure search works is a great place to get in-depth details about some of the things that I mentioned.