1
votes

We have a requirement to process multiple files from different folders using Azure data factory and USQL.

Here is the folder structure we have

Year --> Month --> Day

We have a folder for every date, say 1,2,3...31. The requirement is to read files from specific folders and pass it to USQL to do analytics processing. We need to process data for multiple dates. Is there any way in data factory we can read data from multiple folders.

Example: I need to read data for the dates, 1,7 and 10 for a specific month. I do not want to read all the files for the month

Please let us know if you have come across a solution for the above scenario.

2

2 Answers

2
votes

yes there is. You should use Virtual columns. Example: Your file has only column1 and column2.

path1=/yourFolder/2018/11/1/file.csv

path2=/yourFolder/2018/10/25/file.csv

DECLARE date1 = new DateTime(2018,11,1);
DECLARE date2 = new DateTime(2018,10,25);
@inputData = EXTRACT column1 string,
column2 string, 
FileDate DateTime //this is virtual column
FROM "/yourFolder/{FileDate:yyyy}/{FileDate:MM}/{FileDate:dd}/file.csv"
USING Extractors.Text(delimiter:';',skipFirstNRows:1);
@res = SELECT * FROM @inputData WHERE FileDate == date1 AND FileDate ==date2;

This way you can manage your folder structure using virtual columns. You can ofcourse use BETWEEN if you want to take files from some date range etc. HTH

P.S. You can send parameters from ADF to USQL stored procedure i just gave an example how to work with specific files.

0
votes

Below code worked. As suggested above.

DECLARE EXTERNAL @startDate string = "2018-08-01"; // value will be passed from ADF DECLARE EXTERNAL @endDate string = "2018-08-31"; // value will be passed from ADF

DECLARE @report_start_date DateTime = DateTime.ParseExact(@startDate,"yyyy-MM-dd",CultureInfo.InvariantCulture); DECLARE @report_end_date DateTime = DateTime.ParseExact(@endDate,"yyyy-MM-dd",CultureInfo.InvariantCulture);

DECLARE @INPUT_FILE=@valueFromADF+"/{date:yyyy}/{date:M}/{date:d}/{filename}.txt";

@dataAsString = EXTRACT value string,date DateTime FROM @INPUT_FILE USING Extractors.Text(delimiter:'\n'); @dataAsStrings = SELECT * FROM @dataAsString WHERE date BETWEEN @report_start_date AND @report_end_date;