0
votes

I have set-up a Azure Cognitive Search index with the 'Content and metadata' extract option in the indexer (content is extracted within the 'content' field of the index).

So far this has been working fine. But now I would like to search for documents containing a given string (so within the 'content' field) AND containing another string in the document name (stored in the 'metadata_storage_name'field).

I have tried many API requests but it is not satisfying so far, and I am getting lost in the Azure documentation...

Could anybody help?

Here is what I have tried to far:

  • search=+6151654200 +Product Instructions_EN_&$count=true
  • search=6151660260&$count=true&$filter=search.ismatch('Product Instructions_EN_')
  • search=6151660260&$count=true&$filter=search.ismatch('Product Instructions_EN_', 'metadata_storage_name')
  • search=6151660260&$searchFields=content&$count=true&$filter=search.ismatch('Product Instructions_EN_', 'metadata_storage_name')
  • search=content:6151660260 AND metadata_storage_name:"Product Instructions_EN_"&$count=true&querytype=full

For instance I should get only one file in my index that fullfill the following condition:

metadata_storage_name countains "Product Instructions_EN" AND content contains "6151656050"

Here is the targeted file metadata (except content):

 {
     "metadata_storage_name": "ELB-ELS-ELC_Pistol_Product Instructions_EN_6159929240_EN-02-EN.PDF",
     "metadata_storage_content_type": "application/pdf",
     "metadata_storage_last_modified": "2020-12-14T15:32:08Z",
     "metadata_storage_size": 2834713
     "key": "aHR0cHM6Ly9hY3N0YXBhcHAwMDAwMDNjcGQuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RvY3VtZW50YXRpb24tZmlsZXMvRUxCLUVMUy1FTENfUGlzdG9sX1Byb2R1Y3QlMjBJbnN0cnVjdGlvbnNfRU5fNjE1OTkyOTI0MF9FTi0wMi1FTi5QREY1"
}

Instead, I get a full list of multiple files and the one with the highest search score doesn't even contain Product "Instructions_EN"...

I have no specific anaylsers on the metadata_storage_name metadata.

Thanks!

1

1 Answers

0
votes

You don't explain what happens and why that is not satisfying. It would be helpful if you updated your question with a brief example of the data you have, what happens and what you expected should happen.

Other than that you seem to be on the right track. By default Azure Search will use the so-called simple mode and it default to OR-mode. This is not what most people would expect, and you will get more results the more precise your query is.

In your case I believe it would be enough to use queryType=full and searchMode=all. You search for one token (6151660260) in the searchable tokens, and you want to apply a scoped search in the property metadata_storage_name.

search=6151660260 metadata_storage_name:"Product Instructions_EN_"&queryType=full&searchMode=all

The underscore characters may not be included in your index. It depends on which analyzer is defined for the field metadata_storage_name. If you check the index tab in the portal you can see what analyzer is used. If it's included you may have to escape it with a backslash, so you can also try:

search=6151660260 metadata_storage_name:"Product Instructions\_EN\_"&queryType=full&searchMode=all

Top tip: test with a simple example without underscores or special characters first. Once you have the basic syntax and options down, then work on getting special characters working.

ANALYZER

You need to understand how your analyzer works, what tokens are produced for it and how you can query it. If you don't specify anything, you use the Standard analyzer. We can test the analyzer with a POST request to the REST API. E.g. if we submit the following JSON to the URL https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/analyze?api-version={{API-VERSION}}

{
"text": "ELB-ELS-ELC_Pistol_Product Instructions_EN_6159929240_EN-02-EN.PDF",
"analyzer": "{{ANALYZER}}"
}

We get a list of tokens back. You can search for one or more of these tokens. This is what goes into your index, nothing else.

{
"@odata.context": "https://<my-search-service>.search.windows.net/$metadata#Microsoft.Azure.Search.V2020_06_30_Preview.AnalyzeResult",
"tokens": [
    {
        "token": "elb",
        "startOffset": 0,
        "endOffset": 3,
        "position": 0
    },
    {
        "token": "els",
        "startOffset": 4,
        "endOffset": 7,
        "position": 1
    },
    {
        "token": "elc_pistol_product",
        "startOffset": 8,
        "endOffset": 26,
        "position": 2
    },
    {
        "token": "instructions_en_6159929240_en",
        "startOffset": 27,
        "endOffset": 56,
        "position": 3
    },
    {
        "token": "02",
        "startOffset": 57,
        "endOffset": 59,
        "position": 4
    },
    {
        "token": "en.pdf",
        "startOffset": 60,
        "endOffset": 66,
        "position": 5
    }
]

}

In your case you are trying to search for two different tokens as a phrase surrounded with quotes. That is not going to work. Here are some examples that will work.

  • search=6151660260 &$count=true&searchMode=all&queryType=full
  • search=6151660260 instructions_en_6159929240_en&$count=true&searchMode=all&queryType=full
  • search=6151660260 instructions_en_6159929240_en elc_pistol_product&$count=true&searchMode=all&queryType=full
  • search=6151660260 metadata_storage_name:"ELB-ELS-ELC_Pistol_Product Instructions_EN_6159929240_EN-02-EN.PDF"&$count=true&searchMode=all&queryType=full
  • search=6151660260 "ELB-ELS-ELC_Pistol_Product Instructions_EN_6159929240_EN-02-EN.PDF"&$count=true&searchMode=all&queryType=full