Importing one Databricks Notebook into another error

Question

I am trying to run a Jupyter Notebook within another, in Databricks.

The code below fails, the error is 'df3 is not defined'. But, df3 is defined.

input_file = pd.read_csv("/dbfs/mnt/container_name/input_files/xxxxxx.csv")
df3 = input_file
%run ./NotebookB

The first line of code in NotebookB is below (all Markdowns are shown in Databricks with no issues):

df3.iloc[:,1:] = df3.iloc[:,1:].clip(lower=0)

I do not get such an error in my Jupyter Notebook, e.g. the code below works:

input_file = pd.read_csv("xxxxxx.csv")
df3 = input_file
%run "NotebookB.ipynb"

Basically, it seems as if when running NotebookB in Databricks, the definition of df3 is not used or forgotten, leading to the 'not defined' error.

Both Jupyter Notebooks belong in the same Workspace folder in Databricks.

Peter Pan Peter Pan · Accepted Answer · 2019-09-13T08:15:31

I see you want to pass the structured data like DataFrame from an Azure Databricks Notebook to the other one by calling.

Please refer to the offical document Notebook Workflows to know how to use functions dbutils.notebook.run and dbutils.notebook.exit to do that.

Here is the sample code in Python from the section Pass structured data of the offical document above.

%python

# Example 1 - returning data through temporary tables.
# You can only return one string using dbutils.notebook.exit(), but since called notebooks reside in the same JVM, you can
# return a name referencing data stored in a temporary table.

## In callee notebook
sqlContext.range(5).toDF("value").createOrReplaceGlobalTempView("my_data")
dbutils.notebook.exit("my_data")

## In caller notebook
returned_table = dbutils.notebook.run("LOCATION_OF_CALLEE_NOTEBOOK", 60)
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + returned_table))

So for passing the pandas dataframe in your code, you need to first convert the pandas dataframe to pyspark dataframe by using spark.createDataFrame function as below.

df3 = spark.createDataFrame(input_file)

Then to pass it by the code below.

df3.createOrReplaceGlobalTempView("df3")
dbutils.notebook.exit("df3")

Meanwhile, to change the roles of NotebookA and NotebookB and to call NotebookA as callee from NotebookB as caller.

Importing one Databricks Notebook into another error

2 Answers