2
votes

I am working on a project in Azure DataFactory, and I have a pipeline that runs a Databricks python script. This particular script, which is located in the Databricks file system and is run by the ADF pipeline, imports a module from another python script located in the same folder (both scripts are located in in dbfs:/FileStore/code).

The code below can import the python module into a Databricks notebook but doesn't work when is imported into a python script.

sys.path.insert(0,'dbfs:/FileStore/code/')
import conn_config as Connect

In the cluster logs, I get: Import Error: No module named conn_config

I guess that the problem is related to the inability of the python file of recognizing the Databricks environment. Any help?

3
You're right. This was an error. - Ispan Cristi
That really took a while 😉 Well, thanks anyway 😊 PS: You should still go on the tour ... - Wolf

3 Answers

1
votes

You can't use path with dbfs: in it - Python doesn't know anything about this file system. You have two choices:

  1. Replace dbfs:/ with /dbfs/ (won't work on Community edition)
  2. Copy file(s) from DBFS to local file system with dbutils.fs.cp("dbfs:/FileStore/code", "file:/tmp/code", True), and refer to that local file name: /tmp/code
0
votes

I finally get it done with spark. Once the Spark Session is created (if your cluster has the spark session integrated there is no need to initiate a session):

spark.sparkContext.addPyFile("dbfs:/FileStore/code/conn_config.py")
import conn_config as C

This syntax can import a python module to a python script which is run from Azure DataFactory.

-1
votes

You can just use references to filestores:

(0,'dbfs:/FileStore/code')