I/O operations with Azure Databricks REST Jobs API

Question

I would like to execute the content of Azure Databricks notebook with use of REST Jobs API in the following manner:

pass a set of key:value arguments to a notebook's PySpark Context
perform some Python calculations informed by the parameters

For point 1 I use the following (as suggested by the documentation here):

curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path"}, "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit

To address point 2 I tried the following approaches without success:

2.1. approach 1: input = spark.conf.get("base_parameters", "default")

2.2. approach 2: input = spark.sparkContext.getConf().getAll()

2.3. approach 3:

a = dbutils.widgets.getArgument("input_multiple_polygons", "default")

b = dbutils.widgets.getArgument("input_date_start", "default")

c = dbutils.widgets.getArgument("input_date_end", "default")

input = [a,b,c]

2.4. approach 4 (as per the official documentation here):

a = dbutils.widgets.get("input_multiple_polygons")

b = dbutils.widgets.get("input_date_start")

c = dbutils.widgets.get("input_date_end")

input = [a,b,c]

The REST Jobs endpoints are working fine and the execution is successful, however, none of the outlined four approaches seems to be able to deliver the arguments to the PySpark Context.

I am sure I do something incorrect in either the curl part or the args retrieval part but I can't identify the problem. Can anyone suggest where the issue may be?

If the answer is helpful for you, you can accept it as answer( click on the check mark beside the answer to toggle it from greyed out to filled in.). This can be beneficial to other community members. Thank you. — CHEEKATLAPRADEEP-MSFT

InfamousCoconut InfamousCoconut · Accepted Answer · 2020-07-06T18:29:22

Looks like you are not enclosing the base_parameter as an element within notebook_task. Can you try something like below? I assume you are passing right values for base_parameters since the example shared shows parameter values are given same as parameter name.

curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path", "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit

Easy way to identify how it looks like is to define a job using UI and use api/2.0/jobs/get?job_id=<jobId> to see the JSON response.

I/O operations with Azure Databricks REST Jobs API

1 Answers