I have U-SQL script that uses file pattern to find files in Azure Data Lake and extracts some data from them:
DECLARE @input_file string = @"\data\{*}\{*}\{*}.avro";
@data = EXTRACT
Column1 string,
Column2 double
FROM @input_file
USING new MyExtractors.AvroExtractor();
File pattern is:
data/{Namespace}-{EventHub}-{PartitionId}/{Year}-{Month}-{Day}/{Hour}-{Minute}-{Second}
Problem: Custom extractor is executing very slow. I have many files in the Data Lake and it takes 15hrs to process and costs $600USD per run. Too slow and too expensive.
I only need to extract fresh data from files that are not more than 90 days old. How can I filter out old files using file pattern, file date modified or any other technique?