Your question has 2 parts:
- How to normalize the product code (removing spaces and hyphens)
- How to apply the right analyzer to the component of the query string to achieve different behavior for different fields (requiring an exact match on the code field)
For product code normalization, you can use a few custom analyzer features:
- Mapping char filter. "A char filter that applies mappings defined with the mappings option. Matching is greedy (longest pattern matching at a given point wins). Replacement is allowed to be the empty string." We'll use this to remove hyphens and spaces from product codes.
- Uppercase. "Normalizes token text to upper case." This means users don't have to worry about capitalizing product code letters
Here's a complete example of an index with these analyzer options set. The example index has an id field. id is analogous to the product code
{
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenFilters": [
"uppercase"
],
"charFilters": [
"hyphen-filter"
],
"name": "id-analyzer",
"tokenizer": "standard_v2"
}
],
"charFilters": [
{
"mappings": [
"-=>",
"\\u0020=>"
],
"name": "hyphen-filter",
"@odata.type": "#Microsoft.Azure.Search.MappingCharFilter"
}
],
"name": "index",
"fields": [
{
"key": true,
"name": "key",
"type": "Edm.String"
},
{
"analyzer": "id-analyzer",
"name": "id",
"type": "Edm.String"
}
]
}
You can find full documentation for the create index call here. Note that you may not use the portal since custom analyzers are not supported.
Make sure the product code part of the query is inside a phrase e.g., “ABYX6 8BD DELL AZX” - this way the query parser will send the whole phrase as a token to the lexical analyzer for processing. You can learn more about that here: How full text search works in Azure Search.
The second question is trickier. If you don't know where in the query string the product code is, then we can’t know. Unless fielded search syntax is used, the entire query string will be processed for each field independently with the analyzer configured on that field. This means if we perform the normalization correctly, for query “ABYX6 8BD DELL AZX” Azure Search will try to match terms
- ABYX68BDDELLAZX – against the Code field
- abyx6 8bd dell azx – against the other two fields assuming they are using the standard analyzer
The first query won’t match, so only documents that have dell or azx somewhere in Name or Manufacturer will be returned.
I’d recommend modifying the UX of the application to allow users to input product code independently allowing for some variability in the format. The only other alternative is to treat any query as a free text query and allow the search engine to match many results and rank higher the ones that matched more terms.
Please let me know if you have additional questions.
Thanks,
Matt