0
votes

I am new to Azure cognitive search. I am using Azure blob storage to store documents (.docx). Whenever I am searching for a particular information from the document that has been stored in Azure blob using query phrases (for example: "government rules") with hit highlighting feature, it highlights result from document wherever "government" and "rules" terms are present. Because of this I am getting some irrelevant information from document which are having only "rules" term.

Is there any way to get only the particular section that user want to know (like only the "government rules") from document instead of getting the sections that matches at least one term in query phrase.

Please help me with this. Thank you in advance.

1

1 Answers

0
votes

A quick explanation. Azure Cognitive Search and all the other search engines use a data structure known as "inverted index". It's basically an index which stores the id of the document and the frequency (how many times the searched word appears in the document).

For example:

enter image description here

If you search for "sky", documents 2 and 3 will be retrieved. But if you want "blue sky", you must specify that both terms needs to exist in the same document and one preceded by the other.

In Azure Cognitive Search, you can pass the term inside double quotes, but it will only retrieve exact matches for the provided term. As another option, you can word with a different Analyzer.

I recommend you study about how a search engine and an Analyzers works:

https://docs.microsoft.com/en-us/azure/search/search-lucene-query-architecture

(Specially the section) https://docs.microsoft.com/en-us/azure/search/search-lucene-query-architecture#stage-1-query-parsing

https://docs.microsoft.com/en-us/azure/search/index-add-custom-analyzers