0
votes

I am trying to implement DevOps on ADF and it was successful with pipelines having activities which fetch data from ADLS location and SQL.

But now I have a pipeline in which one of the activity is to run a jar file from dbfs location as shown below.

enter image description here

This pipeline will run a jar file which is in the dbfs location and proceed.

The connection parameters for the cluster is as shown below. enter image description here

While deploying the ARM template from dev ADF to UAT instance, which is having UAT instance of databricks, I was not able to override any of the cluster connection details from arm_template_parameter.json file.

  1. How to configure the workspace URL and clusterID for UAT/PROD environment at the time of ARM deployment? There is no entry for any of the cluster details in the arm_template_parameter.json file.

  2. As shown in the first picture, if there is an activity which picks the jar file from DEV instance dbfs loaction, with system generated jar file name, Will it fail when the ARM template for this pipeline is deployed in other environments? If so How to deploy the same jar file with same name in DEV/PROD databricks dbfs location?

Any leads appreciated!

2

2 Answers

1
votes

What you have to do here is modify the existing custom parameterization template to fit your needs. This template controls which ARM template parameters are generated when you publish the factory. This can be done in the Parameterization template tab in the management hub.

By default, the workspace name and URL should already be generated in the ARM template. To have your existing cluster id as part of this, you add existingClusterId (the JSON field name in the linked service) to the template under Microsoft.DataFactory/factories/linkedServices.

While I don't like sharing documentation on this forum, we actually have this exact use case demoed at https://docs.microsoft.com/azure/data-factory/continuous-integration-deployment#example-parameterizing-an-existing-azure-databricks-interactive-cluster-id

0
votes

In my experience this is not very well/intuitively implemented at the moment. The best way I have personally found to achieve this is to Parameterise your Linked Service and then either use references to a Key Vault that holds the correct value for the given environment or to manipulate the parameters.json file (which will now hold the Schedule parameters) in a DevOps pipeline using the File Transform task.

Neither are very elegant and ideally you would be able to reference Key Vault secrets using some syntax in the Parameter expressions, but alas we are not there yet.