1
votes

I'm trying to access a csv stored in Azure blob and read it into a pandas dataframe in my python script. But I'm running into issues with imports and actually reading the csv. I'm at least able to see that it exists using my python script, which looks like:

import os, uuid, sys
from io import StringIO
import pandas as pd
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobService

try:  
    global service_client
    storage_account_name = 'ACCOUNT_NAME'
    storage_account_key = 'ACCOUNT_KEY'
    storage_connection_string = 'ACCOUNT_STRING'
    storage_container_name = 'CONTAINER_NAME'
    csv_path = '<PATH_TO>/FILE.csv'

    service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
        "https", storage_account_name), credential=storage_account_key)

    file_system_client = service_client.get_file_system_client(file_system=storage_container_name)

    print('GET PATH(S)')
    paths = file_system_client.get_paths(path=csv_path)
    for path in paths:
        print(path.name + '\n')

    blob_service = BlobService(account_name=storage_account_name, account_key=storage_account_key)
    blobstring = blob_service.get_blob_to_text(storage_container_name,csv_path)
    df = pd.read_csv(StringIO(blobstring))

except Exception as e:
    print(e)

finally:
    print('DONE')

The issue is that I can't correctly read the csv into my pd df. Also, I'm running into the issue of actually using BlobService, as every time I try to run the script, I get the error:

ImportError: cannot import name 'BlobService' from 'azure.storage.blob'

My pip freeze for azure looks like this:

azure-common==1.1.25
azure-core==1.5.0
azure-storage-blob==12.3.1
azure-storage-common==2.1.0
azure-storage-file-datalake==12.0.1

What is it that I'm doing wrong here?

1
Blobservice is not a class in the new python Azure blob sdk. It is just in the old python storage sdk azure-storage. If you use new python storage sdk, we need to BlobClient to get a blob.Jim Xu

1 Answers

1
votes

According to the code you provide, you use class BlobService to download file from Azure blob storage. The class is in the sdk azure.storage 0.20.0. But you install sdk azure.storage.blob. So you will get the error. Since you have installed sdk azure.storage.blob, we can the class BlobClient to download blob.

For example

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
#download csv file from Azure blob
sas_url = "your blob sas url"
blob_client = BlobClient.from_blob_url(sas_url)
downloaded_blo = blob_client.download_blob()

#read csv file
import pandas as pd
df = pd.read_csv(StringIO(downloaded_blob.content_as_text()) )