I would like to execute the content of Azure Databricks notebook with use of REST Jobs API in the following manner:
- pass a set of key:value arguments to a notebook's PySpark Context
- perform some Python calculations informed by the parameters
For point 1 I use the following (as suggested by the documentation here):
curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path"}, "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit
To address point 2 I tried the following approaches without success:
2.1. approach 1: input = spark.conf.get("base_parameters", "default")
2.2. approach 2: input = spark.sparkContext.getConf().getAll()
2.3. approach 3:
a = dbutils.widgets.getArgument("input_multiple_polygons", "default")
b = dbutils.widgets.getArgument("input_date_start", "default")
c = dbutils.widgets.getArgument("input_date_end", "default")
input = [a,b,c]
2.4. approach 4 (as per the official documentation here):
a = dbutils.widgets.get("input_multiple_polygons")
b = dbutils.widgets.get("input_date_start")
c = dbutils.widgets.get("input_date_end")
input = [a,b,c]
The REST Jobs endpoints are working fine and the execution is successful, however, none of the outlined four approaches seems to be able to deliver the arguments to the PySpark Context.
I am sure I do something incorrect in either the curl part or the args retrieval part but I can't identify the problem. Can anyone suggest where the issue may be?