0
votes

We have an files partitioned in the datalake and are using Azure Synapse SQL Serverless pool to query them using external tables before visualising in Power BI.

Files are stored in the following partition format {source}/{year}/{month}/{filename}_{date}.parquet

We then have an external table that loads all files for that source.

For all files that increment each day this is working great as we want all files to be included. However we have some integrations that we want to return only the latest file. (i.e. the latest file sent to us is the current state that we want to load into Power BI).

Is it possible in the external table statement to only return the latest file? Or do we have to add extra logic? We could load all the files in, and then filter for the latest filename and save that in a new location. Alternatively we could try to create an external table that changes every day.

Is there a better way to approach this?

1

1 Answers

0
votes

If you are using dedicated pools then I would alter the location of your table with the latest files folder.

Load every day into a new folder and then alter the LOCATION of the external table to look at the current/latest day, but you might need to add additional logic to track in a control table what the latest successful load date is.

Unfortunately I have not found a better way to do this myself.