1
votes

I'm trying to use U-SQL scripts in Azure Data Lake Analytics(ADLA) to process two csv files uploaded to Azure Data Lake Store(ADLS). There is one row and three columns in the CSV file. I'm not clear how to use U-SQL scripts to add the three elements of each file and put the results into a new CSV file. Could anyone help me with the problem?

2
Please provide some sample data and expected results.wBob

2 Answers

2
votes

If your files are in the same folder then you don't need to UNION anything. Simply use the filesets and virtual columns to refer to them. Here is a simple example:

@input =
    EXTRACT colA int,
            colB string,
            colC DateTime?,
            filename string
    FROM "/input/{filename}.log"
    USING Extractors.Csv();


// Do some processing if you need
@output =
    SELECT DISTINCT *
    FROM @input;


// Output results
OUTPUT @output
TO "/output/output.csv"
USING Outputters.Csv();

In this example, I have two files of the same structure in my input directory of file type .log. When I run the script the two files are effectively UNIONed together in one resultset.

1
votes

If I understand your question right, you need to output 3 rows from your CSV files, where each file has 1 row and 3 columns. The way to do it would be to use UNION operation in U-SQL like it is described here:

    @result = 
        SELECT * FROM @f1
        UNION ALL BY NAME ON (*)
        SELECT * FROM @f2
        UNION ALL BY NAME ON (*)
        SELECT * FROM @f3;  

OUTPUT @result 
TO "pathtoyourfile.csv" 
USING Outputters.Csv();