1
votes

I'm creating a structured streaming job that stores its data in a databricks delta database. I'm confronted with the option of storing the checkpoint location and data from the delta database in either ... 1. a normal dbfs location like "/delta/mycheckpointlocation" and "delta/mydatabase" 2. a mounted directory from a data lake like "/mnt/mydatalake/delta/mycheckpointlocation" and "/mnt/mydatalake/delta/mydatabase"

If I understand correctly the data in nr1 will be persisted in a blob storage while the data in nr2 would be stored in the data lake (assuming it's mounted on /mnt/mydatalake)

What considerations are there to decide to store stuff like the checkpoint location and the delta database in either 1 or 2?

1

1 Answers

0
votes

The DBFS location is a part of your workspace. So if you drop the workspace you lose it. The lake is shared so many things can connect to it, including other Databricks workspaces, or other services (like ADF). There is no right or wrong to this - pure preference.