1
votes

When I am trying to mount ADLS Gen2 to Databricks, I have this issue : "StatusDescription=This request is not authorized to perform this operation" if the ADLS Gen2 firewall is enabled. But the request works fine if the firewall is disabled.

Someone can help please ?

configs = {"fs.azure.account.auth.type": "OAuth",
               "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
               "fs.azure.account.oauth2.client.id": clientID,
               "fs.azure.account.oauth2.client.secret": keyID,
               "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" + tenantID + "/oauth2/token"}

dbutils.fs.mount(
  source = "abfss://" + fileSystem + "@" + accountName + ".dfs.core.windows.net/",
  mount_point = "/mnt/adlsGen2",
  extra_configs = configs)

StatusCode=403
StatusDescription=This request is not authorized to perform this operation.
ErrorCode=
ErrorMessage=
    at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:134)
    at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:498)
    at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:164)
    at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:445)
    at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:362)
    at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:486)
    at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:435)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)
4

4 Answers

1
votes

This error is caused by the service principal not having read/execute permission on the file path - not the firewall.

FYI. On the Storage Azure you can allow Microsoft Trusted Services to access the resource. This includes Databricks. But like I say I do not believe you have a firewall issue.

To resolve the permissions issue I would first look at the IAM Roles for the FileSystem. From Azure portal go to the storage account > FileSystems and open the Access Controls (IAM) blade. Using the Check access screen paste the Client/ApplicationID of your service principal and check what permissions it has.

To have read access to the filesystem the SP must be in one of the following roles: * Owner * Storage Blob Data Contributor * Storage Blob Data Owner * Storage Blob Data Reader

Any of these roles will give full access to read all files in the FileSystem.

If not you can still grant permissions at a folder/file level using Azure Storage Explorer. Remember that all folders in the chain must have Execute permission at each level. For example:

/Root/SubFolder1/SubFolder2/file.csv

You must grant Execute on Root, SubFolder1 & SubFolder2 as well as Read on SubFolder2.

Further details: https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control

1
votes

If you enable the firewall on an Azure Data Lake Store Gen2 account, this configuration only works with Azure Databricks if you deploy Azure Databricks in your own virtual network. It does not work with workspaces deployed without vnet-injection feature. On the storage account you have to enable access from the public-Databricks subnet.

0
votes

You need to use Vnet-Injection during creation. This blog post walks you through it. https://www.keithmsmith.com/azure-data-lake-firewall-databricks/

0
votes

I also faced same issue but later figured out that you need to have only (Storage Blob Data Contributor) Role specified on your data lake for your service principal. If you have given only just (Contributor) role it will not work. Or both Contributor and Storage Blob Data Contributor it will not work. You have to just provide Storage Blob Data Contributor on your data lake gen 2 enter image description here