3
votes

I am reading files on an Azure Blob Storage account (gen 2) from an Azure Databricks Notebook. Both services are in the same region (West Europe). Everything works fine, except when I add a firewall in front of the storage account. I have opted to allow "trusted Microsoft services":

Azure Portal Storage Account - Firewall

However, running the notebook now ends up with an access denied error:

com.microsoft.azure.storage.StorageException: This request is not authorized to perform this operation.

I tried to access the storage directly from Spark and by mounting it with dbutils, but same thing.

I would have assumed that Azure Databricks counts as a trusted Microsoft service? Furthermore I couldn't find solid information on IP ranges for Databricks regions that could be added to the firewall rules.

2

2 Answers

4
votes

Yes, the Azure Databricks does not count as a trusted Microsoft service, you could see the supported trusted Microsoft services with the storage account firewall.

From networking, Here are two suggestions:

  1. Find the Azure datacenter IP address and scope a region where your Azure Databricks located. Whitelist the IP list in the storage account firewall.

  2. Deploy Azure Databricks in your Azure Virtual Network (Preview) then whitelist the VNet address range in the firewall of the storage account. You could refer to configure Azure Storage firewalls and virtual networks. Also, you have NSG to restrict inbound and outbound traffics from this Azure VNet. Note: you need to deploy Azure Databricks to your own VNet.

Hope this helps.

1
votes

The described scenario only works if you deploy Azure Databricks in your own Azure Virtual Network (vnet). With this you are able to use Service Endpoints, so could add your Databricks vnet to the Blob Storage. With the default deployment this is not supported and not possible. See the following Documentation for more details and a description how to get the vnet-injection feature enabled.

Enabling the mentioned exception does not work, as Azure Databricks is not in the list of trusted Services for Blob Storage. See the following Documentation which services still can access the storage account with the exception enabled.