Azure search services issue for white space and wildcard search of special characters

Question

We have an application that allows the users to enter anything on the summary field. The users can type in any special characters like #$!@~ etc including white space and they request that they can search based on those special characters as well. For example, one of the entry is "test testing **** #### !!!!! ???? @ $".

I created a cognitive search index with analyzer to be standard.lucene, shown below:

{ "name": "Summary", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "sortable": true, "facetable": true, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "synonymMaps": [] }

When I used the postman query:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": "testing", "searchFields": "Summary", "count":true }

I can get the expected result.

If I use the following:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": "testing ****", "searchFields": "Summary", "count":true }

I got the error with "InvalidRequestParameter".

If I changed to the following query:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": ""****"", "searchFields": "Summary", "count":true }

Then I am not getting any results back.

Per this article: https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax#escaping-special-characters

In order to use any of the search operators as part of the search text, escape the character by prefixing it with a single backslash (). Special characters that require escaping include the following:

- & | ! ( ) { } [ ] ^ " ~ * ? : \ /

I need to prefix with single backslash for the special characters. But in my case it doesn't seem to work. Any help will be appreciated!

Michael Scott Michael Scott · Accepted Answer · 2021-08-31T15:10:46

If you are using standard lucene analyzer for your indexing, I believe the "****" is not saved as a word. Lucene analyzer breaks the words on special characters.

For fields that you need to be searched on, e.g., the summary field in your example, you need to create a custom analyzer for that field. This document talks about how you can do that, test your analyzer. Once you have built an analyzer that tokenizes the input the way you want, you can use that in your index definition for the fields that need it as follows.

...
{
  "name": "Summary",
  "type": "Edm.String",
  "retrievable": true,
  "searchable": true,
  "analyzer": "custom_analyzer_for_tokenizing_as_is"
},
...

Azure search services issue for white space and wildcard search of special characters

2 Answers