0
votes

I've managed to write a python script to list out all the blobs within a container.

import azure
from azure.storage.blob import BlobService
from azure.storage import *

blob_service = BlobService(account_name='<CONTAINER>', account_key='<ACCOUNT_KEY>')


blobs = []
marker = None
while True:
    batch = blob_service.list_blobs('<CONAINER>', marker=marker)
    blobs.extend(batch)
    if not batch.next_marker:
        break
    marker = batch.next_marker
for blob in blobs:
    print(blob.name)

Like I said this only lists the blobs that I want to download. I've moved onto the Azure CLI to see if that could aid in what I want to do. I'm able to download a single blob with

azure storage blob download [container]

it then prompts me specify a blob which I can grab from the python script. The way I would have to download all those blobs is to copy and paste them into the prompt after the command used above. Is there a way I can either:

A. Write a bash script to iterate through the list of blobs by executing the command, then pasting the next blob name in the prompt.

B. Specify to download the container in either the python script or Azure CLI. Is there something I'm not seeing to download the whole container?

4
Have you tried downloading the blobs using blob_service.download_blob_to_path? Please see an example here: azure.microsoft.com/en-in/documentation/articles/….Gaurav Mantri

4 Answers

3
votes

@gary-liu-msft solution is correct. I made some more changes to the same, now the code can iterate through the containers and the folder structure in it (PS - there are no folders in containers, just path), check if the same directory structure exists in client and if not then create that directory structure and download the blobs in those path. It supports the long paths with embedded sub directories.

from azure.storage.blob import BlockBlobService
from azure.storage.blob import PublicAccess
import os

#name of your storage account and the access key from Settings->AccessKeys->key1
block_blob_service = BlockBlobService(account_name='storageaccountname', account_key='accountkey')

#name of the container
generator = block_blob_service.list_blobs('testcontainer')

#code below lists all the blobs in the container and downloads them one after another
for blob in generator:
    print(blob.name)
    print("{}".format(blob.name))
    #check if the path contains a folder structure, create the folder structure
    if "/" in "{}".format(blob.name):
        print("there is a path in this")
        #extract the folder path and check if that folder exists locally, and if not create it
        head, tail = os.path.split("{}".format(blob.name))
        print(head)
        print(tail)
        if (os.path.isdir(os.getcwd()+ "/" + head)):
            #download the files to this directory
            print("directory and sub directories exist")
            block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
        else:
            #create the diretcory and download the file to it
            print("directory doesn't exist, creating it now")
            os.makedirs(os.getcwd()+ "/" + head, exist_ok=True)
            print("directory created, download initiated")
            block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
    else:
        block_blob_service.get_blob_to_path('testcontainer',blob.name,blob.name)

The same code is also available here https://gist.github.com/brijrajsingh/35cd591c2ca90916b27742d52a3cf6ba

2
votes

Since @brij-raj-singh-msft answer, Microsoft released Gen2 version of Azure Storage Blobs client library for Python. (code below is tested with Version 12.5.0) This snippet is tested on 9/25/2020

import os
from azure.storage.blob import BlobServiceClient,ContainerClient, BlobClient
import datetime

# Assuming your Azure connection string environment variable set.
# If not, create BlobServiceClient using trl & credentials.
#Example: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobserviceclient 

connection_string = os.getenv("AZURE_STORAGE_CONNECTION_STRING")

blob_service_client = BlobServiceClient.from_connection_string(conn_str=connection_string) 
# create container client
container_name = 'test2'
container_client = blob_service_client.get_container_client(container_name)

#Check if there is a top level local folder exist for container.
#If not, create one
data_dir ='Z:/azure_storage'
data_dir = data_dir+ "/" + container_name
if not(os.path.isdir(data_dir)):
    print("[{}]:[INFO] : Creating local directory for container".format(datetime.datetime.utcnow()))
    os.makedirs(data_dir, exist_ok=True)
    
#code below lists all the blobs in the container and downloads them one after another
blob_list = container_client.list_blobs()
for blob in blob_list:
    print("[{}]:[INFO] : Blob name: {}".format(datetime.datetime.utcnow(), blob.name))
    #check if the path contains a folder structure, create the folder structure
    if "/" in "{}".format(blob.name):
        #extract the folder path and check if that folder exists locally, and if not create it
        head, tail = os.path.split("{}".format(blob.name))
        if not (os.path.isdir(data_dir+ "/" + head)):
            #create the diretcory and download the file to it
            print("[{}]:[INFO] : {} directory doesn't exist, creating it now".format(datetime.datetime.utcnow(),data_dir+ "/" + head))
            os.makedirs(data_dir+ "/" + head, exist_ok=True)
    # Finally, download the blob
    blob_client = container_client.get_blob_client(blob.name)
    dowlload_blob(blob_client,data_dir+ "/"+blob.name)

def dowlload_blob(blob_client, destination_file):
    print("[{}]:[INFO] : Downloading {} ...".format(datetime.datetime.utcnow(),destination_file))
    with open(destination_file, "wb") as my_blob:
        blob_data = blob_client.download_blob()
        blob_data.readinto(my_blob)
    print("[{}]:[INFO] : download finished".format(datetime.datetime.utcnow()))    

The same code is also available here https://gist.github.com/allene/6bbb36ec3ed08b419206156567290b13

1
votes

Currently, it seems we cannot directly download all the blobs from a container with a single API. And we can get all the available operations with blobs at https://msdn.microsoft.com/en-us/library/azure/dd179377.aspx.

So we can list the ListGenerator of blobs, then download the blobs in loop. E.G.:

result = blob_service.list_blobs(container)
for b in result.items:
    r = blob_service.get_blob_to_path(container,b.name,"folder/{}".format(b.name))

update

import blockblob service when using azure-storage-python:

from azure.storage.blob import BlockBlobService

0
votes

I made a Python wrapper for the Azure CLI which enables us to do downloads / uploads in batches. This way we can download a complete container or certain files from a container.

To install:

pip install azurebatchload
import os
from azurebatchload.download import DownloadBatch

az_batch = DownloadBatch(
    destination='../pdfs',
    source='blobcontainername',
    pattern='*.pdf'
)
az_batch.download()