DevOps for Azure Databricks Jobs

Question

I am trying to implement DevOps on Azure Databricks.

I have completed devops implementation for databricks notebooks and dbfs files.

I do have many databricks jobs running on my cluster based on schedule. Some of these jobs points to notebook files and few points to jar file in the dbfs location.

Is there any way to implement devops process on the azure databricks jobs so that any change in any of the jobs in DEV will invoke build pipeline and deploy the same in PROD databricks instance.

First of all I wanted to know whether it is possible to implement devops on azure databricks jobs.

Any Leads Appreciated!

your jobs are implemented as notebooks, or as jars or python files? — Alex Ott

Alex Ott Alex Ott · Accepted Answer · 2021-02-05T07:14:15

To do this effectively, I would recommend to use Databricks Terraform provider for that - in this case the definition of the job could be stored in the Git or something like, and then it's easy to integrate with CI/CD systems, such as Azure DevOps, GitHub Actions, etc.

The differences between environments could be the coded as variables with different files with variables for different environments, etc., so you can re-use the main code between environments, like this:

provider "databricks" {
  host  = var.db_host
  token = var.db_token
}

data "databricks_spark_version" "latest" {}
data "databricks_node_type" "smallest" {
  local_disk = true
}

resource "databricks_job" "this" {
  name = "Job"

  new_cluster {
    num_workers   = 1
    spark_version = data.databricks_spark_version.latest.id
    node_type_id  = data.databricks_node_type.smallest.id
  }

  notebook_task {
    notebook_path = "path_to_notebook"
  }

  email_notifications {}
}

P.S. Theoretically, you can implement some periodic task that will pull the jobs definitions from your original environment, and check if the jobs definitions has changed, and apply the changes to another environment. You can even track changes to the jobs definitions via diagnostic logs, and use that as trigger.

But all of this is just hacks - it's better to use Terraform.

DevOps for Azure Databricks Jobs

1 Answers