1
votes

My use case requires me to continuously write incoming messages into files stored in an Azure Data Lake Gen2 storage account. I am able to create the files by triggering a function, which uses the python azure-storage-file-datalake SDK to interact with the storage account.

The problem is that by default the files created using the create_file() method of the DataLakeFileClient class are Block Blobs (and there isn't any parameter to change the type of blob that gets created), which means I cannot append data to them after new messages arrive.

I have tried using the python azure-storage-blob SDK, however, it is unable to use paths to locate files within the containers of my Data Lake.

This would be an example of how I am creating the files, although those come out as Block Blobs:

if int(day) in days:  
    day_directory_client.create_directory()                                     
    file_client = day_directory_client.create_file(f'{json_name}')                                                 
    file_client.append_data(data=f'{str(message_body)}\n', offset=0,  
    length=len(str(message_body)))                                     
    file_client.flush_data(len(str(message_body)))                                     
    write_to_cache(year, month, day, json_path)

I appreciate any help I can get, thanks!

1

1 Answers

0
votes

If you want to create an append blob in an Azure Data Lake Gen2 account, you will need to use azure-storage-blob package instead of azure-storage-file-datalake.

azure-storage-file-datalake package is a wrapper over Azure Data Lake Store REST API which does not allow you to specify blob type.