how to boost the score in azure search for unstructured blob data?

Question

I am using Azure search which is using default indexing on the data which is importing unstructured data (pdf, doc, text, image files etc.)

I didn't make any scoring profile on the default available fields.

Almost every setting in the portal is the default. If I search any text through the search explorer then I get the JSON result which has very low search score.

I read about score boosting using the scoring profile. however, the terms which I want to find out can be in any document at any place. so how can I decide on which field I can weight more?

how can I generate more custom fields on these input files? Do I need to write document parser?

I am using SDK 4.0 and c# in my bot.

please suggest.

ramero-MSFT ramero-MSFT · Accepted Answer · 2018-12-10T22:32:50

To use scoring profile, the fields you are trying to boost need to be part of the index definition, otherwise the scoring mechanism won't know about them.

You mentioned using unstructured data as your source, I assume this means your data does not have any stable or predictable structure. If that's the case, then you probably won't be able to update your index definition to match exactly the structure of every document, since different documents will likely have a different and unpredictable structure. If you know what fields you want to boost, and you know how to retrieve those fields from your document, then you could update your index definition with only the fields you care about, and then use the "merge" document API to populate that field for each document.

https://docs.microsoft.com/en-us/rest/api/searchservice/addupdate-or-delete-documents

This would require you to retrieve all documents from the index, parse the data to extract the field you want to boost, and then use the merge API to update the index data with the data you extracted. Once you have this, you will be able to use that field as part of a scoring profile.

how to boost the score in azure search for unstructured blob data?

1 Answers