0
votes

We are using the .net library for azure search, I have successfully built the index and stored data in the index. One of our fields is called Tags which is a collection of strings and it is marked as searchable. So we put values in this field such as C# .NET.

The problem is when searching the search service will not hit on C#, it will on C, nor will it hit on .NET but it will on NET. I can see thru fiddler that the search term is encoding the # and also the ., but it doesn't seem like it's getting decoded on the azure side.

3

3 Answers

3
votes

The behavior you're seeing is the result tokenization performed by the standard analyzer used by Azure Search. By default it breaks on many punctuation characters like # and . (you can get all the details of text analysis in Azure Search here).

We're looking into adding support for custom analyzers that would let you exclude characters such as # and . from word-breaking, but this is still in the planning stages. In the meantime, as a workaround we suggest encoding these characters in your application before indexing and querying (e.g. -- C# -> CSharp, .NET -> dotNET).

1
votes

Thanks Bruce, for now I have just created a function in our search implementation that removes punctuation from the search term provided by the end user. This way I don't have to go thru and update all search index/records.

    private string SanitizeValue(string value)
    {
        return Regex.Replace(value, @"[^a-zA-Z0-9\s]", "");
    }
0
votes

You could try using Regex search, like searching for this string: /.*c\#.*/. Also make sure you set SearchParameters.QueryType = QueryType.Full.