To be clear about the format, this is how the DataFrame is saved in Databricks:
folderpath = "abfss://[email protected]/folder/path"
df.write.format("delta").mode("overwrite").save(folderPath)
This produces a set of Parquet files (often in 2-4 chunks) that are in the main folder, with a _delta_log folder that contains the files describing the data upload. The delta log folder dictates which set of Parquet files in the folder should be read.
In Databricks, i would read the latest dataset for exmaple, by doing the following:
df = spark.read.format("delta").load(folderpath)
How would i do this in Azure Data Factory? I have chosen Azure Data Lake Gen 2, then the Parquet format, however this doesn't seem to work, as i get the entire set of parquets read (i.e. all data sets) and not just the latest.
How can i set this up properly?