Our organisation runs Databricks on Azure that is used by data scientists & analysts primarily for Notebooks in order to do ad-hoc analysis and exploration.
We also run Kubernetes clusters for non spark-requiring ETL workflows.
We would like to use Delta Lakes as our storage layer where both Databricks and Kubernetes are able to read and write as first class citizens.
Currently our Kubernetes jobs write parquets directly to blob store, with an additional job that spins up a databricks cluster to load the parquet data into Databrick's table format. This is slow and expensive.
What I would like to do is write to Delta lake from Kubernetes python directly, as opposed to first dumping a parquet file to blob store and then triggering an additional Databricks job to load it into Delta lake format.
Conversely, I'd like to also leverage Delta lake to query from Kubernetes.
In short, how do I set up my Kubernetes python environment such that it has equal access to the existing Databricks Delta Lake for writes & queries?
Code would be appreciated.