1
votes

We have an application that allows the users to enter anything on the summary field. The users can type in any special characters like #$!@~ etc including white space and they request that they can search based on those special characters as well. For example, one of the entry is "test testing **** #### !!!!! ???? @ $".

I created a cognitive search index with analyzer to be standard.lucene, shown below:

{ "name": "Summary", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "sortable": true, "facetable": true, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "synonymMaps": [] }

When I used the postman query:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": "testing", "searchFields": "Summary", "count":true }

I can get the expected result.

If I use the following:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": "testing ****", "searchFields": "Summary", "count":true }

I got the error with "InvalidRequestParameter".

If I changed to the following query:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": ""****"", "searchFields": "Summary", "count":true }

Then I am not getting any results back.

Per this article: https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax#escaping-special-characters

In order to use any of the search operators as part of the search text, escape the character by prefixing it with a single backslash (). Special characters that require escaping include the following:

    • & | ! ( ) { } [ ] ^ " ~ * ? : \ /

I need to prefix with single backslash for the special characters. But in my case it doesn't seem to work. Any help will be appreciated!

2

2 Answers

0
votes

If you are using standard lucene analyzer for your indexing, I believe the "****" is not saved as a word. Lucene analyzer breaks the words on special characters.

For fields that you need to be searched on, e.g., the summary field in your example, you need to create a custom analyzer for that field. This document talks about how you can do that, test your analyzer. Once you have built an analyzer that tokenizes the input the way you want, you can use that in your index definition for the fields that need it as follows.

...
{
  "name": "Summary",
  "type": "Edm.String",
  "retrievable": true,
  "searchable": true,
  "analyzer": "custom_analyzer_for_tokenizing_as_is"
},
...
0
votes

I finally get this one resolved by creating a customized analyzer. The index definition:

{
    "name": "FieldName",
    "type": "Edm.String",
    "searchable": true,
    "filterable": true,
    "retrievable": true,
    "sortable": true,
    "facetable": true,
    "key": false,
    "indexAnalyzer": null,
    "searchAnalyzer": null,
    "analyzer": "specialcharanalyzer",
    "synonymMaps": []
},

The analyzer is specified below:

"analyzers": [
    {
        "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
        "name": "specialcharanalyzer",
        "tokenizer": "whitespace",
        "tokenFilters": [
            "lowercase"
        ],
        "charFilters": []
    }
],

Then you can use the format this document specified https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax#special-characters

https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax#special-characters

Special characters that require escaping include the following:

+ - & | ! ( ) { } [ ] ^ " ~ * ? : \ /

For characters not in the above required escaping character, use the following format for infix search:

"search": "/.*SearchChar.*/",

For example, if you want to search for $, then use the following format:

"search": "/.*$.*/",

For special characters in the list, use this format:

"search" : "/.*\\escapingcharacter.*/",

For example to search for +, use the following query;

"search" : "/.*\\+.*/",

# is also considered to be escaping character if it is in a statement.

To search for *, use this format:

"search":"/\\**/",