Hello Azure Search Team,
Sorry if the question seems big but I wanted to explain it with some data which might make the question look verbose.
I'm from PowerBI team and have a question based on the documentation of the Search Highlight feature that we have in Azure Search.
I created an Azure Based Search index yesterday with a sample document like below.
"DocumentId": "257d13f0-ea1f-412f-9858-baa49b35f6b5",
"ModelId": "78869cb7-352e-4415-911e-464308c6d8d9",
"TableId": "Employees",
"ColumnId": "Details",
"ColumnValues": [
"Boston Massachusetts",
"Tampa Florida",
"Palo Alto California",
"Sentenceeeeeeeeeeeeeeeeeeeeeee with 101 characters tokenwith50characterssssssssssssssssssssssssssssss",
"Data is repeated Data is repeated Data is repeated",
"Data is repeated. Data is repeated. Data is repeated.",
"Washington",
"Washington D.C"
]
Note that only the "ColumnValues
" is searchable. Also, notice the repeated values in ColumnValues[4]
and ColumnValues[5]
with and without a English sentence separator(.) (Assuming index starts at 0).
Now, if a user searches for "Data"
, we'd pass the below search query to Azure Search:
\"/.*Data.*/\" &queryType=full &highlight=ColumnValues-100&highlightPreTag=''&highlightPostTag=" &searchMode=any &$top=1500 &$count=true
Below is the response from Azure Search API in the search portal:
{
"@odata.context": "https://huynazuresearch1.search.windows.net/indexes('columnbasedindex')/$metadata#docs(*)",
"@odata.count": 1,
"value": [
{
"@search.score": 1,
"@search.highlights": {
"ColumnValues": [
"''Data\" is repeated ''Data\" is repeated ''Data\" is repeated",
"''Data\" is repeated.",
"''Data\" is repeated.",
"''Data\" is repeated."
]
},
"DocumentId": "257d13f0-ea1f-412f-9858-baa49b35f6b5",
"ModelId": "78869cb7-352e-4415-911e-464308c6d8d9",
"TableId": "Employees",
"ColumnId": "Details",
"ColumnValues": [
"Boston Massachusetts",
"Tampa Florida",
"Palo Alto California",
"Sentenceeeeeeeeeeeeeeeeeeeeeee with 101 characters tokenwith50characterssssssssssssssssssssssssssssss",
"Data is repeated Data is repeated Data is repeated",
"Data is repeated. Data is repeated. Data is repeated.",
"Washington",
"Washington D.C"
]
}
]
}
Now, we get the document in return as expected but we do some processing on Search Highlight values returned by Azure Search.
For our needs, we need to form an ColumnInfo
object of {ColumnId , ColumnValues}
for each match. To do that, we iterate over the @search.highlights array and try to map each highlighted value to the respective ColumnValues
.
Now, for the first value in @search.highlights.ColumnValues
- "''Data\" is repeated ''Data\" is repeated ''Data\" is repeated"
, we can easily map it to ColumnValues[4]
by an equals kind of a match.
So, we can form a ColumnInfo
object {"Details", "Data is repeated Data is repeated Data is repeated"}
easily.
However, for the remaining values (index 1,2 & 3) in @search.highlights.ColumnValues
- we see that all 3 of them ("''Data" is repeated.") map to the ColumnValues[5]
.
I see a issue with this. When the searchable value has a . (some delimiter), the search highlight breaks itself there and hence does not return the entire instance of ColumnValues
field.
As we are interested in building the ColumnInfo
object of {ColumnId , ColumnValues}
, we are interested in the entire value of ColumnValue
instance and not parts/highlights of it.
Is there anyway, we can override this behavior and let Azure Search return the entire string for the respective ColumnValue
that was matched, as part of Search Highlight?
Having this will avoid us to do a Contains
kind of match after getting results from Azure search to construct the custom ColumnInfo
object of {ColumnId , ColumnValues}
.
I wanted to see what are the suggested options for this. Apologies if the question is verbose, I'm happy to schedule a short call to discuss if needed.
Thanks, Sagar