0
votes

I have a Databricks instance which does some work. Jobs are triggered from Azure Data Factory. There is several environments and each one has its own Key Vault to store secrets.

As long as I kept access token - let's say "hardcoded" - within a Databricks linked service configuration everything worked fine. But I need to comply with security standards, so keeping it in JSON which lays somewhere isn't an option - it was fine for the time being.

Key Vault to the rescue - access token to the Databricks is created via API and stored in a Key Vault, now I wanted to use the Key Vault as linked service in Databricks linked service to populate access token, and the surprise comes here - it doesn't work.

I can't debug pipeline, I can't trigger it, I can't even test a connection, it always fails with 403 Invalid access token:

enter image description here

The JSON for this linked service:

{
    "name": "ls_databricks",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "annotations": [],
        "type": "AzureDatabricks",
        "typeProperties": {
            "domain": "https://**************.azuredatabricks.net",
            "accessToken": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "ls_keyVault",
                    "type": "LinkedServiceReference"
                },
                "secretName": "DatabricksAccessToken"
            },
            "existingClusterId": "*********"
        }
    }
}

While, using Postman I can easily access Databricks API using the same access token: enter image description here


Key Vault linked service itself works fine and connection test passes: enter image description here


I have configured different linked service to connect to ADLS using Key Vault and it works as expected: enter image description here

Does anybody have any ideas what's wrong here? It is just broken or I'm doing something wrong?

p.s. Apologies for flooding you with all of these screenshots :)

I'm using https://docs.databricks.com/dev-tools/api/latest/scim/scim-sp.html SCIM API to entitle my service principal to a proper group on Databricks instance.

2

2 Answers

0
votes

Have you set up the appropriate access policy in the Key Vault? The preferred methodology would be to turn on Managed Identity for Data Factory and then add the Data Factory identity to the Key Vault access policy

Key Vault has a separate tier of access to the secrets which needs to be configured for Data Factory since Data Factory is trying to access it and needs to be provisioned access to the secrets.

0
votes

Finally I was able to solve that using Databricks CLI extensions.

Working solution (wonder for how long, it's experimental extension):

  az extension add --name datafactory
  
  $lsDatabricks = @{
      "type" = "AzureDatabricks"
      "typeProperties" = @{
          "domain" = "https://***********.azuredatabricks.net"
          "existingClusterId" = "************-vale**"
          "accessToken" = @{ 
              "type" = "SecureString"
              "value" = "dapi******************************"
          }
      }
  }

  $lsJson = $lsDatabricks | ConvertTo-Json -Compress 
  $lsJson = $lsJson -Replace '"', '\"'

  az datafactory linked-service create --factory-name "yourAdf" --name "yourDatabricksLinkckedService" --resource-group "yourGroup" --properties "$lsJson"

I traveled very bumpy road just to configure silly linked service. By the way, I was trying do it via parameterized template, like:

"AzureDatabricks": {
    "properties": {
        "typeProperties": {
            "domain": "=",
            "existingClusterId": "=",
            "accessToken": "=:accessToken:secureString"
        }
    }

Unfortunately, whatever value I was overriding deploying ARM template, final value was unchanged.