0
votes

I am trying to run a spark-submit from Azure Databrics. Currently I can create a job, with the jar uploaded within the Databrics workspace, and run it.

My queries are:

  1. Is there a way to access a jar residing on a GEN2 DataLake storage and do a spark-submit from Databrics workspace, or even from Azure ADF ? (Because the communication between the workspace and GEN2 storage is protected "fs.azure.account.key")

  2. Is there a way to do a spark-submit from a databrics notebook?

2
I have already done that. (I wrote in the first line.) I am looking for a way to do points 1 and 2 specifically. Could you suggest or comment on that?partha_devArch

2 Answers

1
votes

Is there a way to access a jar residing on a GEN2 DataLake storage and do a spark-submit from Databrics workspace, or even from Azure ADF ? (Because the communication between the workspace and GEN2 storage is protected "fs.azure.account.key") Unfortunately, you cannot access a jar residing on Azure Storage such as ADLS Gen2/Gen1 account.

Note: The --jars, --py-files, --files arguments support DBFS and S3 paths.

Typically, the Jar libraries are stored under dbfs:/FileStore/jars.

You need to upload libraries in dbfs and pass as the parameters in the jar activity.

For more details, refer "Transform data by running a jar activity in Azure Databricks using ADF".

Is there a way to do a spark-submit from a databricks notebook?

To answer the second question, you may refer the below Job types:

enter image description here

Reference: SparkSubmit and "Create a job"

Hope this helps.


If this answers your query, do click “Mark as Answer” and "Up-Vote" for the same. And, if you have any further query do let us know.

-1
votes

Finally I figured out how to run this:

  1. You can do a run a Databricks jar from an ADF, and attach it to an existing cluster, which will have the adls key configured in the cluster.

  2. It is not possible to do a spark-submit from a notebook. But you can create a spark job in jobs, or you can use the Databricks Run Sumbit api, to do a spark-submit.