2
votes

Is it possible to execute a python Wheel Class/Method(not a script) in Azure Data Factory using an Azure Databricks activity like you would execute if it were a java packaged method in a .jar? Unlike a script, this would have the ability to return a value(s), without doing something like burying them stdout.

I haven't been able to search anything and I tried using the jar activity with no luck which didn't surprise me but worth a try.

If not, what I am looking for is a way to use Azure Databricks compute and return a small set of values back from the python job. I have successfully used the ADF activity for databricks python script.

TIA!

1

1 Answers

1
votes

Yes. Add the wheel as a library on the cluster. Then create a .py file that imports the library and calls the method you need. Save the py file onto the dbfs volume.

Create a data factory pipeline that uses the python task and point it at your py file. You can pass in arguments as well.

You could also do this with a notebook that imports the library.

This blog post (and the series it is in) should help https://datathirst.net/blog/2019/9/20/building-pyspark-applications-as-a-wheel