EDIT: We now have a new tutorial article that walks through this scenario step by step, would love to hear if this helps solve the problem more easily.
Original Response:
A couple of possible things to consider here.
Does the data need to be encrypted at rest with the specified customer key in the search index? If so, the document you shared is going to be the best way to do that. That would mean that you would need a way to store the customer specific keys in KeyVault, and then reference that key per Azure Cognitive Search index (so you would need an index per customer, but could use one search service overall provided you don't go over the limits for the service tier you are using). If it doesn't need to be encrypted with the customer key in the search index and the system managed encryption is fine, you shouldn't need this.
Regardless of the answer to that question, are you trying to use indexers to index the data, and want to make use of the provided json extraction?
If you would like to use indexers and the provided json extraction option, we have an preview skill that you can use to help allow this. The steps would look something like this:
- In your indexer definition, you would need to set two things (both documented here):
"allowSkillsetToReadFileData": true
- This option will allow us to reference the encrypted blob in the skillset.
"dataToExtract": "allMetadata"
- This option will allow you to bypass normal pre-skillset content extraction, which would otherwise fail while the data is still encrypted.
- Create a skillset for the indexer defined in #1 that has at least the following two skills (you can add more after if you would like any other skillset functionality):
- A custom web api skill that takes in the
"/document/file_data"
object as input, decrypts the file by doing some sort of external lookup for the customer key for that document, and then returns the decrypted data as a file reference object.
- A DocumentExtractionSkill (currently preview) that takes in the file reference object that was returned from the custom skill, with
"parsingMode"
set to "json"
.
- This will parse the now decrypted JSON file, similar to how it would have been parsed if it was decrypted in the original blob storage and if you had used the default
dataToExtract
option.
- Note that in the documentation a very specific input format is required for this skill, so you'll need to make sure this is exactly what is returned from your custom defined skill in 2a.
This is a bit of a complicated approach, but the DocumentExtractionSkill was actually designed with exactly the scenario you are describing in mind, so we'd love to hear feedback on if it works for you or not.
If you don't care about using indexers, you can always write a workflow yourself that downloads the file, decrypts and parses it, and then sends it to the Azure Cognitive Search index using the push model. With this option, you would have do the parsing yourself and wouldn't get all of the nice features that indexers provide for free such as change tracking and automatic scheduled runs.