0
votes

I have a blob storage that has a number of folders, each folder has a number of pdf documents. I now want to create an azure search index which indexes the data by folder level, but includes a complex type structure (Collection(edm.ComplexType) that allows me to include all the documents. So the index looks like this:

{"name": "index",
"fields":
[
    {"name": "id", "type": "Edm.String", "filterable": true, "key": true, "searchable": true, "sortable": true, "facetable": false},
    {"name": "folderName", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "en.microsoft"},
    {"name": "documents", "type": "Collection(Edm.ComplexType)",
    "fields": [
        {"name": "documentName", "type": "Edm.String", "searchable": true, "retrievable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "en.microsoft"},
        {"name": "content", "type": "Edm.String", "searchable": true, "retrievable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.microsoft", "synonymMaps": ["synonymsmap"]},
        {"name": "documentType", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "en.microsoft"},
        {"name": "language", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "en.microsoft"}
    ]
    }
]

}

Does anyone know how I should approach this? I have been creating and populating indexes using rest api.

I am thinking maybe I need to create a folder-level index structure and populate the folder-level details from some sql-table before populating the sub-fields with the blobs through skillset and indexer etc?

EDITS: Maybe my ideas above are completely off-track. What I want to do is to search a term and return folder names based on the aggregate relevancy of documents within folders. Not sure if this is achievable in search or have to be processed afterwards. Any pointers?

1

1 Answers

0
votes

Does anyone know how I should approach this? I have been creating and populating indexes using rest api.

A: If you want this structure, then you're right. You'll need to create your index and push data by yourself (rest api is one of the options)

I am thinking maybe I need to create a folder-level index structure and populate the folder-level details from some sql-table before populating the sub-fields with the blobs through skillset and indexer etc?

A: This is not a good idea, when searching using a particular term, you'll need to query all the possible indexes and do the ordering by yourself.

I personally would create a simple structure, which no complex types:

{
    "name": "index",
    "fields":
    [
        {"name": "id", "type": "Edm.String", "filterable": true, "key": true, "searchable": true, "sortable": true, "facetable": false},
        {"name": "folderName", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "en.microsoft"},
        {"name": "documentName", "type": "Edm.String", "searchable": true, "retrievable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "en.microsoft"},
        {"name": "content", "type": "Edm.String", "searchable": true, "retrievable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.microsoft", "synonymMaps": ["synonymsmap"]},
        {"name": "documentType", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "en.microsoft"},
        {"name": "language", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "en.microsoft"}
    ]
}

want to retrieve all documents by a particular folder?

search=*&$filter=folderName eq 'abc'

want to retrieve a particular documents in a particular folder?

search=*&$filter=folderName eq 'abc' and documentName eq 'x.docx'

want to all documents that contain a particular term?

search=mickey mouse&$orderBy=folderName

simple and effective