I'm trying to use U-SQL scripts in Azure Data Lake Analytics(ADLA) to process two csv files uploaded to Azure Data Lake Store(ADLS). There is one row and three columns in the CSV file. I'm not clear how to use U-SQL scripts to add the three elements of each file and put the results into a new CSV file. Could anyone help me with the problem?
1
votes
2 Answers
2
votes
If your files are in the same folder then you don't need to UNION
anything. Simply use the filesets and virtual columns to refer to them. Here is a simple example:
@input =
EXTRACT colA int,
colB string,
colC DateTime?,
filename string
FROM "/input/{filename}.log"
USING Extractors.Csv();
// Do some processing if you need
@output =
SELECT DISTINCT *
FROM @input;
// Output results
OUTPUT @output
TO "/output/output.csv"
USING Outputters.Csv();
In this example, I have two files of the same structure in my input
directory of file type .log
. When I run the script the two files are effectively UNIONed
together in one resultset.
1
votes
If I understand your question right, you need to output 3 rows from your CSV files, where each file has 1 row and 3 columns. The way to do it would be to use UNION operation in U-SQL like it is described here:
@result =
SELECT * FROM @f1
UNION ALL BY NAME ON (*)
SELECT * FROM @f2
UNION ALL BY NAME ON (*)
SELECT * FROM @f3;
OUTPUT @result
TO "pathtoyourfile.csv"
USING Outputters.Csv();