You could potentially achieve better upload performance by exploring multithreading. Here is some code to do this:
from azure.storage.blob import BlobClient
from threading import Thread
import os
def upload_blob(container, file, index=0, result=None):
if result is None:
result = [None]
try:
blob_name = ''.join(os.path.splitext(os.path.basename(file)))
blob = BlobClient.from_connection_string(
conn_str='CONNECTION STRING',
container_name=container,
blob_name=blob_name
)
with open(file, "rb") as data:
blob.upload_blob(data, overwrite=True)
print(f'Upload succeeded: {blob_name}')
result[index] = True
except Exception as e:
print(e)
result[index] = False
def upload_wrapper(container, files):
parallel_runs = len(files)
threads = [None] * parallel_runs
results = [None] * parallel_runs
for i in range(parallel_runs):
t = Thread(target=upload_blob, args=(container, files[i], i, results))
threads[i] = t
threads[i].start()
for i in range(parallel_runs):
threads[i].join()
There may be better chunking strategies - this is just an example to illustrate that for certain cases you may be able to achieve greater blob upload performance by using threading.
Here are some benchmarks between the sequential looping approach vs. the above threaded approach (482 image files, 26 MB total):
- Sequential upload: 89 seconds
- Threaded upload: 28 seconds
I should also add that you might consider invoking azcopy via Python, as this tool is may be better suited for your particular need.