1
votes

We're migrating from blob storage to ADLS Gen 2 and we want to test the access to Data Lake from DataBricks. I created a service principal which has Blob Storage Reader and Blob Storage Contributor access to Data Lake.

My notebook sets the below spark config:

 spark.conf.set("fs.azure.account.auth.type","OAuth")
 spark.conf.set("fs.azure.account.oauth.provider.type","org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
 spark.conf.set("fs.azure.account.oauth2.client.id","<clientId")
 spark.conf.set("fs.azure.account.oauth2.client.secret","<secret>")
 spark.conf.set("fs.azure.account.oauth2.client.endpoint","https://login.microsoftonline.com/<endpoint>/oauth2/token")
//I replaced the values in my notebook with correct values from my service principal

When I run the below code, the content of the directory are shown correctly:

dbutils.fs.ls("abfss://ado-raw@<storage account name>.dfs.core.windows.net")

I can read a small text file from my data lake which is only 3 bytes but when I'm trying to show its content, the cell gets stuck at running command and nothing happens.

What do you think the issue is? and how do I resolve it?

Thanks in advance

1
I had the same issue and in my case it happened because I created a private endpoint to connect ADLS Gen 2 to Data Factory. There isn't a documentation explaining why this happens, I discovered by my own. But also checks if your Databricks cluster is setted into ADLS Gen 2 Firewall. - Kafels
Did you follow this step about create a container? - Kafels
@Kafels the thing is the values are being read from datalake but they are not displayed. the access is not an issue - Morez
I don't think so, your stage job is frozen 0/1 - Kafels
Wait your command throw an exception and update your question. It probably will run for 20 minutes before stopping. - Kafels

1 Answers

1
votes

The issue was the private and public subnets had been deleted by mistake and then recreated using a different IP range. They need to be on the same range as the management subnet, otherwise the private endpoint set up for the storage account won’t work.