I'm using the connection_string to access an Azure Data Lake Gen2 storage, in which lots of Avro files were stored by an Event Hubs Capture, under the typical directory structure containing folders named by year/month/day/hour/minute. I'm using the azure.storage.filedatalake package.
Firstly I get a Data Lake service client using:
datalake_service_client = DataLakeServiceClient.from_connection_string(connection_string)
And then I get the file systems in the lake by:
file_systems = datalake_service_client.list_file_systems()
for file_system in file_systems:
print(file_system.name)
There is only one file system in this case, called "datalake1". At this point I want to access to all the Avro files I expect to find therein. I'm trying by firstly getting a file system client:
file_system_client = datalake_service_client.get_file_system_client("datalake1")
and then by using the get_paths method:
file_system_client.get_paths()
It returns an iterator (azure.core.paging.ItemPaged object), but from here I'm not being able to see the folders and files. I tried with a simple list comprehension like [x.name for x in file_system_client.get_paths()]
but I got the error StorageErrorException: Operation returned an invalid status 'The specified container does not exist.'
Any idea about how to access the Avro files following this procedure?
EDIT: I'm using azure-storage-file-datalake version 12.0.0. Here a screenshot of the code:
Thanks