Mounting Azure Blob Storage to Azure Databricks without using cluster

Question

We have a requirement that while provisioning the Databricks service thru CI/CD pipeline in Azure DevOps we should able to mount a blob storage to DBFS without connecting to a cluster. Is it possible to mount object storage to DBFS cluster by using a bash script from Azure DevOps ?

I looked thru various forums but they all mention about doing this using dbutils.fs.mount but the problem is we cannot run this command in Azure DevOps CI/CD pipeline.

Will appreciate any help on this.

Thanks

J. Offenberg J. Offenberg · Accepted Answer · 2020-08-14T09:12:21

What you're asking is possible but it requires a bit of extra work. In our organisation we've tried various approaches and I've been working with Databricks for a while. The solution that works best for us is to write a bash script that makes use of the databricks-cli in your Azure Devops pipeline. The approach we have is as follows:

Retrieve a Databricks token using the token API
Configure the Databricks CLI in the CI/CD pipeline
Use Databricks CLI to upload a mount script
Create a Databricks job using the Jobs API and set the mount script as file to execute

The steps above are all contained in a bash script that is part of our Azure Devops pipeline.

Setting up the CLI
Setting up the Databricks CLI without any manual steps is now possible since you can generate a temporary access token using the Token API. We use a Service Principal for authentication.

https://docs.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/tokens

Create a mount script
We have a scala script that follows the mount instructions. This can be Python as well. See the following link for more information:

https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html#mount-azure-data-lake-storage-gen2-filesystem.

Upload the mount script
In the Azure Devops pipeline the databricks-cli is configured by creating a temporary token using the token API. Once this step is done, we're free to use the CLI to upload our mount script to DBFS or import it as a notebook using the Workspace API.

https://docs.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/workspace#--import

Configure the job that actually mounts your storage
We have a JSON file that defines the job that executes the "mount storage" script. You can define a job to use the script/notebook that you've uploaded in the previous step. You can easily define a job using JSON, check out how it's done in the Jobs API documentation:

https://docs.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/jobs#--

At this point, triggering the job should create a temporary cluster that mounts the storage for you. You should not need to use the web interface, or perform any manual steps.

You can apply this approach to different environments and resource groups, as do we. For this we make use of Jinja templating to fill out variables that are environment or project specific.

I hope this helps you out. Let me know if you have any questions!

Mounting Azure Blob Storage to Azure Databricks without using cluster

1 Answers