I'm trying to access a csv stored in Azure blob and read it into a pandas dataframe in my python script. But I'm running into issues with imports and actually reading the csv. I'm at least able to see that it exists using my python script, which looks like:
import os, uuid, sys
from io import StringIO
import pandas as pd
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobService
try:
global service_client
storage_account_name = 'ACCOUNT_NAME'
storage_account_key = 'ACCOUNT_KEY'
storage_connection_string = 'ACCOUNT_STRING'
storage_container_name = 'CONTAINER_NAME'
csv_path = '<PATH_TO>/FILE.csv'
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", storage_account_name), credential=storage_account_key)
file_system_client = service_client.get_file_system_client(file_system=storage_container_name)
print('GET PATH(S)')
paths = file_system_client.get_paths(path=csv_path)
for path in paths:
print(path.name + '\n')
blob_service = BlobService(account_name=storage_account_name, account_key=storage_account_key)
blobstring = blob_service.get_blob_to_text(storage_container_name,csv_path)
df = pd.read_csv(StringIO(blobstring))
except Exception as e:
print(e)
finally:
print('DONE')
The issue is that I can't correctly read the csv into my pd df. Also, I'm running into the issue of actually using BlobService, as every time I try to run the script, I get the error:
ImportError: cannot import name 'BlobService' from 'azure.storage.blob'
My pip freeze for azure looks like this:
azure-common==1.1.25
azure-core==1.5.0
azure-storage-blob==12.3.1
azure-storage-common==2.1.0
azure-storage-file-datalake==12.0.1
What is it that I'm doing wrong here?
azure-storage
. If you use new python storage sdk, we need toBlobClient
to get a blob. – Jim Xu