0
votes

I have thousands of csv files containing spanning from January 2016 until today. I want to load all files from 25th of November 2016 until 02nd of January 2017.

I know I can use virtual path as below, but wont this load all my data from disk? I only want data from the period above. Will adding the @result query (modifying to my time period) ensure that only the files I am interested in are loaded into memory?

DECLARE @file_set_path2 string = @dir + "{date:yyyy}/{date:MM}/{date:dd}/{date:MM}{date:dd}{date:yyyy}.csv";

@data = 
    EXTRACT vala int, 
            valb long, 
            valc DateTime, 
            date DateTime // virtual file set column
    FROM @file_set_path2
    USING Extractors.Csv();

@result = 
SELECT *
FROM @data
WHERE date > DateTime.Parse("2016-11-24")
AND   date < DateTime.Parse("2017-01-03");
1

1 Answers

1
votes

If the predicate is comparing against values that the compiler can see (e.g., constants, constant foldable expressions or script parameters) and the predicate can be moved (e.g., you use AND and not && in the predicate for conjunction) then the optimizer will only touch the files inside the specified range. So the query above should be fine.

You should get a warning if the predicate is not one of the above.

If you do not get this behavior, please let me know.