1
votes

My Application is storing data for customers under one Blob Storage Account and each customer is allocated a specific container.
Under the container, the data for the specific customer is saved in multiple Append Blobs, however, the data is encrypted using custom keys, and each customer has their own specific key.
The customer specific keys are kept in the Database, and before writing the content to Blob, the Application gets the key from Database (based on the customer), and encrypts the data.
Now I have to implement Search Functionality on the data in Blob Storage (so each customer can search their relevant data), and Azure Cognitive Search seems like a perfect solution.
However, I can not understand based on the documentation how to achieve the search on custom encrypted data. Th best and most relevant document I have found is this. But is talks about Key-Vault though.
1. How can I achieve the search functionality on encrypted data (where even the encryption key changes based on customer), where the key is kept in the Database?
2. Is the search achievable using one Azure Cognitive Search or do I need to implement separate Search Service for every customer?

P.S: Data is in JSON format, before its encrypted and written to Azure Blob Storage.

1

1 Answers

2
votes

EDIT: We now have a new tutorial article that walks through this scenario step by step, would love to hear if this helps solve the problem more easily.

Original Response:

A couple of possible things to consider here.

Does the data need to be encrypted at rest with the specified customer key in the search index? If so, the document you shared is going to be the best way to do that. That would mean that you would need a way to store the customer specific keys in KeyVault, and then reference that key per Azure Cognitive Search index (so you would need an index per customer, but could use one search service overall provided you don't go over the limits for the service tier you are using). If it doesn't need to be encrypted with the customer key in the search index and the system managed encryption is fine, you shouldn't need this.

Regardless of the answer to that question, are you trying to use indexers to index the data, and want to make use of the provided json extraction? If you would like to use indexers and the provided json extraction option, we have an preview skill that you can use to help allow this. The steps would look something like this:

  1. In your indexer definition, you would need to set two things (both documented here):
    1. "allowSkillsetToReadFileData": true
      1. This option will allow us to reference the encrypted blob in the skillset.
    2. "dataToExtract": "allMetadata"
      1. This option will allow you to bypass normal pre-skillset content extraction, which would otherwise fail while the data is still encrypted.
  2. Create a skillset for the indexer defined in #1 that has at least the following two skills (you can add more after if you would like any other skillset functionality):
    1. A custom web api skill that takes in the "/document/file_data" object as input, decrypts the file by doing some sort of external lookup for the customer key for that document, and then returns the decrypted data as a file reference object.
    2. A DocumentExtractionSkill (currently preview) that takes in the file reference object that was returned from the custom skill, with "parsingMode" set to "json".
      1. This will parse the now decrypted JSON file, similar to how it would have been parsed if it was decrypted in the original blob storage and if you had used the default dataToExtract option.
      2. Note that in the documentation a very specific input format is required for this skill, so you'll need to make sure this is exactly what is returned from your custom defined skill in 2a.

This is a bit of a complicated approach, but the DocumentExtractionSkill was actually designed with exactly the scenario you are describing in mind, so we'd love to hear feedback on if it works for you or not.

If you don't care about using indexers, you can always write a workflow yourself that downloads the file, decrypts and parses it, and then sends it to the Azure Cognitive Search index using the push model. With this option, you would have do the parsing yourself and wouldn't get all of the nice features that indexers provide for free such as change tracking and automatic scheduled runs.