0
votes

I'm trying to mount data lake with Databricks. My goal is to build data lake. I wonder why format of my url is different from documentation.What is meaning of filesystem and dfs?

I tried to create data lake with Azure Storage Gen2. Enabled hierarchy and started to create directories. I noticed that file url includes word "blob".

This is my url currently: https://datalakestagingtest.blob.core.windows.net/staging/manufacturers/nissan/micra.csv

I see that format is different in DataLake documentation where url may be abfss://@.dfs.core.windows.net/

Reference: https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html

1
In Azure databricks, it uses Hadoop File System. And In Hadoop, we need to use abfs scheme to access Azure Storage Gen 2. For more details, please refer to docs.microsoft.com/en-us/azure/storage/blobs/…Jim Xu
What I need to do in Azure Storage side? @JimXuKenny_I
In normal, we do not need to do anything in Azure Storage side.Jim Xu

1 Answers

0
votes

A couple of important points to note while mounting Storage accounts in Azure Databricks.

For Azure Blob storage: source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>"

For Azure Data Lake Storage gen2: source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/"

To mount an Azure Data Lake Storage Gen2 filesystem or a folder inside it as Azure Databricks file system, the URL should be like abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/

enter image description here

Reference: Azure Databricks - Azure Data Lake Storage Gen2