2
votes

I've created a DataFrame which I would like to write / export next to my Azure DataLake Gen2 in Tables (need to create new Table for this).

In the future I will also need to update this Azure DL Gen2 Table with new DataFrames.

In Azure Databricks I've created a connection Azure Databricks -> Azure DataLake to see my my files:

enter image description here

Appreciate help how to write it in spark / pyspark.

Thank you!

2

2 Answers

2
votes

Steps to write dataframe from Azure Databricks Notebook to Azure Data Lake Gen2:

Step1: Access directly using the storage account access key

enter image description here

Step2: Using DBUTILS to list the files in the storage account

enter image description here

Step3: Use the previosult established DBFS mount point to read the data and create the data frame.

enter image description here

Step4: Write data into Azure Data Lake Gen2 account

Read the airline csv file and write the output to parquet format for easy query

enter image description here

For more details, refer "Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark".

Hope this helps. Do let us know if you any further queries.

1
votes

I would suggest instead of writing data in parquet format, go for Delta format which internally uses Parquet format but provide other features like ACID transaction.The syntax would be

df.write.format("delta").save(path)