2
votes

I have an Azure DataLake Storage Gen2 which contains a few Parquet files. My Organization has enabled credential passthrough and so I am able to create a python script in Azure Databricks and access the files available in ADLS using dbutils.fs.ls. All these work fine.

Now, I need to access the last modified timestamp of these files too. I found a link that does this. However, it uses BlockBlobService and requires an account_key.

I do not have an account key and can't get one due to security policies of the organization. I am unsure of how to do the same using Credential passthrough. Any ideas here?

1

1 Answers

0
votes

You can try to mount the Azure DataLake Storage Gen2 instance with credentials passthrough.

    configs = {
      "fs.azure.account.auth.type": "CustomAccessToken",
      "fs.azure.account.custom.token.provider.class":   spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
    }
    mount_name = 'localmountname'
    container_name = 'containername'
    storage_account_name = 'datalakestoragename'
    dbutils.fs.mount(
      source = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/",
      mount_point = f"/mnt/{mount_name}>",
      extra_configs = configs)