I am using azure-storage-file-datalake package to connect with ADLS gen2
from azure.identity import ClientSecretCredential
# service principal credential
tenant_id = 'xxxxxxx'
client_id = 'xxxxxxxxx'
client_secret = 'xxxxxxxx'
storage_account_name = 'xxxxxxxx'
credential = ClientSecretCredential(tenant_id, client_id, client_secret)
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", storage_account_name), credential=credential) # I have also tried blob instead of dfs in account_url
Folder structure in ADLS gen2 from where I have to read parquet file look like this. Inside container of ADLS gen2 we folder_a which contain folder_b in which there is parquet file.
from gen1 storage we used to read parquet file like this.
from azure.datalake.store import lib
from azure.datalake.store.core import AzureDLFileSystem
import pyarrow.parquet as pq
adls = lib.auth(tenant_id=directory_id,
adl = AzureDLFileSystem(adls, store_name=adls_name)
f = adl.open(file, 'rb') # 'file is parquet file with path of parquet file folder_a/folder_b/parquet_file1'
table = pq.read_table(f)
How do we proceed with gen2 storage, we are stuck at this point
http://peter-hoffmann.com/2020/azure-data-lake-storage-gen-2-with-python.html is the link that we have followed.
Note - We are not using databrick to do this
from azure.storage.file import FileService
? – anky