2
votes

I am trying to process some blobs in Azure Storage container. Our business users upload csv files to a blob container. The task is to process these files and persist the data in staging tables in Azure SQL DB for them to analyse later. This involves creating tables dynamically matching the file structure of the csv files. I have got this part working correctly. I am using python to accomplish this part of the task.

The next part of the task is to notify the user (who uploaded the blob) via an email once the blob has been processed in the DB by providing them with the table name corresponding to the blob. Ideally, I should also be able to set the permissions in the DB by giving read permissions to the user only on the table corresponding to the blob he uploaded.

To accomplish this, I thought I'll read the blob owner or last modified by attributes from the blob property and use that information for notification/db permissions. But I am not able to find any such property in blob properties. I tried using Diagnostic Logging at Storage account level but the logs also don't show any information about created by or modified by.

Can someone please guide me how can I go about getting this working?

2

2 Answers

0
votes

As the information about who created/last modified a blob is not available as a system property, you will need to come up with your own implementation. I can think of a few solutions for that (without using an external database to store this information):

  • Store this information as blob's metadata: Each blob can have custom metadata. You can store this information in blob's metadata by creating two keys: CreatedBy and LastModifiedBy and store appropriate information. Please note that blob's metadata is not queryable and also it is very easy to overwrite the metadata. This is by far the easiest approach I could think of.
  • Make use of x-ms-client-request-id: With each request to Azure Storage, you can pass a custom value in x-ms-client-request-id request header. If storage analytics is enabled, this information gets logged. You could then query analytics data to find this information. However, it is extremely cumbersome to find this information in analytics logs as the information is saved as a line item in a blob in $logs container. To find this information, you would first need to find appropriate blob containing this information. Then you would need to download the blob, find the appropriate log entry and extract this information.

Considering none of the solution is perfect, I would recommend that you go with saving this information in an external database. It would be much simpler to accomplish your goal if you go with an external database.

0
votes

Blobs in azure support custom metadata as a dictionary of key/value pairs you can save foreach file, but in my experience it's not handy in all the cases, specially because you can not query over those without read the blob (azure will charge you that cost) without having in mind the network transfer.

from: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-properties-metadata

Objects in Azure Storage support system properties and user-defined metadata, in addition to the data they contain.

System properties: System properties exist on each storage resource. Some of them can be read or set, while others are read-only. Under the covers, some system properties correspond to certain standard HTTP headers. The Azure storage client library maintains these for you.

User-defined metadata: User-defined metadata is metadata that you specify on a given resource in the form of a name-value pair. You can use metadata to store additional values with a storage resource. These additional metadata values are for your own purposes only, and do not affect how the resource behaves.

I had something very similar to do one time and to avoid creating external databases and connect that I've just created a table in the storage to save each file url from the blob storage without all the properties you need (user permissions) in a unstructured way.

You might find extremely straight forward to query information from the table with python (I did with .net) but I found it's pretty much the same.

https://docs.microsoft.com/en-us/azure/cosmos-db/table-storage-how-to-use-python

Azure Table storage and Azure Cosmos DB are services that store structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design. Because Table storage and Azure Cosmos DB are schemaless, it's easy to adapt your data as the needs of your application evolve. Access to Table storage and Table API data is fast and cost-effective for many types of applications, and is typically lower in cost than traditional SQL for similar volumes of data.

Example code for filtering:

    from azure.cosmosdb.table.tableservice import TableService
    from azure.cosmosdb.table.models import Entity

table_service = TableService(connection_string='DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;TableEndpoint=myendpoint;)

    tasks = table_service.query_entities('tasktable', filter="PartitionKey eq 'tasksSeattle'")
for task in tasks:
    print(task.description)
    print(task.priority)

So you need only create the table and use the keys from azure to connect it. Hope it helps you.