0
votes

I need to upload CSV files to a Azure Data Lake Gen2 file system. I'm at my wits' end trying to set the content type of a Azure Data Lake File when creating it. Please see code below:

from azure.storage.filedatalake import DataLakeServiceClient, ContentSettings

def upload_file_to_directory(category, type, startdatetime, enddatetime, content):
    try:

        service_client = get_service_client()

        file_system_client = service_client.get_file_system_client(file_system="tag-data")

        category_directory_client = file_system_client.get_directory_client(category)

        type_directory_client = category_directory_client.get_sub_directory_client(type)

        year_directory_client = type_directory_client.get_sub_directory_client(startdatetime.strftime("%Y"))

        month_directory_client = year_directory_client.get_sub_directory_client(startdatetime.strftime("%m"))

        day_directory_client = month_directory_client.get_sub_directory_client(startdatetime.strftime("%d"))

        metadata = {"uploadedby": "Casper Alant"}
        content_settings = ContentSettings(content_type = "text/csv")
        file_name = startdatetime.strftime("%Y%m%d%H%M%S") + "-" + enddatetime.strftime("%Y%m%d%H%M%S") + ".csv"

        file_client = day_directory_client.get_file_client(file_name)

        file_client.create_file(content_settings=content_settings, metadata=metadata)

        file_client.append_data(data=content, offset=0, length=len(content))

        file_client.flush_data(len(content))

    except Exception as e:
      print(e)

The file is created with the content, the "uploadedby" metadata is set correctly, but I can't get it to set the Content Type.

I've been following the official documentation here. I can't seem to find many resources on using this SDK.

1
I'm pretty sure this is a bug. I created a pull request for a fix: github.com/Azure/azure-sdk-for-python/pull/10066Casper Alant

1 Answers

1
votes

If you're using azure-storage-file-datalake 12.0.0b7, you can set content-type in the flush_data method.

#your other code

content_settings = ContentSettings(content_type = "text/csv")

file_client.flush_data(len(content),content_settings=content_settings)