0
votes

I have created an Azure Databricks Cluster with Runtime version of "7.5 (includes Apache Spark 3.0.1, Scala 2.12)" on which I have created a Notebook (Python code).

I'm trying to execute this Notebook from a pipeline built on Azure Data Factory, but I get the following error:

Operation on target Notebook1 failed: Databricks execution failed with error state Terminated. For more details please check the run page url: https://PATH

As per the given path, the real error is:

ModuleNotFoundError: No module named 'pyodbc'

The problem here is that I have installed all the libraries, as shown bellow:

libraries Installed

And I can import them successfully on the notebook (as shown bellow), matter of fact the whole script can be executed succefully when launched directly from the notebook:

libraries imported

The probelm, is that I cannot execute the notebook from Azure Data Factory, the first error I get is that there is no module pyodbc!

Should I add a pip install pyodbc on my notebook (is it reliable) ? Or did I missed something ?

Thanks,

1
are you running this notebook on existing cluster? - Alex Ott
Yes, a cluster that I have created with Runtime version of "7.5 (includes Apache Spark 3.0.1, Scala 2.12)" - DSEB
Hi @DSEB, did you get any progresses? If the answer is helpful for you, hope you can accept it as answer. This can be beneficial to other community members. Thank you. - Leon Yue

1 Answers

0
votes

I created a cluster with the same environment, but the code works well. enter image description here

Run the pyodbc code: enter image description here

Then I run the notebook in Data Factory, it also works well. enter image description here

If you add a pip install pyodbc on your notebook, it should works but maybe not recommended. Please try restart the cluster or re-install the pyodbc library.

HTH.