0
votes

I've created python code to create a range of folders and subfolders (for data lake) in an Azure storage container. The code works and is based on the documentation on Microsoft Azure. One thing though is that I'm creating a dummy 'txt' file in the folders in order to create the directory (which I can clean up later). I was wondering if there's a way to create the folders and subfolders without creating a file. I understand that the folders in Azure container storage are not hierarchical and are instead metadata and what I'm asking for may not be possible?

connection_string = config['azure_storage_connectionstring']
gen2_container_name = config['gen2_container_name']
container_client = ContainerClient.from_connection_string(connection_string, gen2_container_name)
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

# blob_service_client.create_container(gen2_container_name)


def create_folder(folder, sub_folder):
    blob_client = container_client.get_blob_client('{}/{}/start_here.txt'.format(folder, sub_folder)) 

    with open ('test.txt', 'rb') as data:
        blob_client.upload_blob(data)



def create_all_folders():
    config = load_config()
    folder_list = config['folder_list']
    sub_folder_list = config['sub_folder_list']
    for folder in folder_list:
        for sub_folder in sub_folder_list:
            try:
                create_folder(folder, sub_folder)
            except Exception as e:
                print ('Looks like something went wrong here trying to create this folder structure {}/{}. Maybe the structure already exists?'.format(folder, sub_folder))

2
For storage sdk it is not possiable, you can use datalake sdk.1_1

2 Answers

1
votes

I've created python code to create a range of folders and subfolders (for data lake) in an Azure storage container. The code works and is based on the documentation on Microsoft Azure. One thing though is that I'm creating a dummy 'txt' file in the folders in order to create the directory (which I can clean up later). I was wondering if there's a way to create the folders and subfolders without creating a file. I understand that the folders in Azure container storage are not hierarchical and are instead metadata and what I'm asking for may not be possible?

No, for blob storage, this is not possible. There is no way to create so-called "folders"

But you can use data-lake SDK like this to create directory:

from azure.storage.filedatalake import DataLakeServiceClient 
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "test"
myfolder     = "test1111111111"
myfile       = "FileName.txt"

file_system_client = datalake_service_client.get_file_system_client(myfilesystem)            
directory_client = file_system_client.create_directory(myfolder)    
0
votes

Just to add some context, the reason this is not possible in Blob Storage is that folders/directories are not "real". Folders do not exist as standalone objects, they are only defined as part of a blob name.

For example, if you have a folder "mystuff" with a file (blob) "somefile.txt", the blob name actually includes the folder name and "/" character like mystuff/somefile.txt. The blob exists directly inside the container, not inside a folder. This naming convention can be nested many times over in a blob name like folder1/folder2/mystuff/anotherfolder/somefile.txt, but that blob still only exists directly in the container.

Folders can appear to exist in certain tooling (like Azure Storage Explorer) because the SDK permits blob name filtering: if you do so on the "/" character, you can mimic the appearance of a folder and its contents. But in order for a folder to even appear to exist, there must be blob in the container with the appropriate name. If you want to "force" a folder to exist, you can create a 0-byte blob with the correct folder path in the name, but the blob artifact will still need to exist.

The exception is Azure Data Lake Storage (ADLS) Gen 2, which is Blob Storage that implements a Hierarchical Namespace. This makes it more like a file system and so respects the concept of Directories as standalone objects. ADLS is built on Blob Storage, so there is a lot of parity between the two. If you absolutely must have empty directories, then ADLS is the way to go.