0
votes

I am trying to understand, why my ACL permissions are not working properly in Databricks.

Scenario: I have 2 Users. one with full permissions on FileSystem and. other without any permissions.

I tried mounting Gen2 filesystem in databricks using 2 different methods.

  1. configs = {"fs.azure.account.auth.type": "OAuth",
               "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
               "fs.azure.account.oauth2.client.id": clientid,
               "fs.azure.account.oauth2.client.secret": credential,
               "fs.azure.account.oauth2.client.endpoint": refresh_url}
    
    
    dbutils.fs.mount(
      source = "abfss://[email protected]/",
      mount_point = "/mnt/xyz",
      extra_configs = configs)
    

and using passthrough 2.

configs = { 
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}

dbutils.fs.mount(
  source = "abfss://[email protected]/",
  mount_point = "/mnt/xyz",
  extra_configs = configs)

both mount the filesystem. But when I use:

dbfs.fs.ls("/mnt/xyz")

It displays all the contents files / folders for the user which has no permissions on datalake.

Would be glad if someone would explain me what's wrong.

Thanks

2

2 Answers

0
votes

This is expected behavior when you enable Azure Data Lake Storage credential passthrough.

Note: When a cluster is enabled for Azure Data Lake Storage credential passthrough, commands run on that cluster can read and write data in Azure Data Lake Storage without requiring users to configure service principal credentials to access the storage. The credentials are set automatically, based on the user initiating the action.

Reference: Enable Azure Data Lake Storage credential passthrough for your workspace and Simplify Data Lake Access with Azure AD Credential Passthrough.

0
votes

Probably you do forget to add permissions in the Access Control (IAM) of the container.

To check this, you can go to the container in azure portal and click on Switch to Azure AD User Account. If you don't have rights, you will see a error message.

For example, you can add the role Storage Blob Data Contributor to have read and write access.

Note: Datalake take some minutes to refresh the credentials, so you need to wait a little bit after adding the role.