0
votes

I have a large file with rows for each day per ID. There can be more then one record per ID per day but only the newest value is valid.

DailyValues:

ID int,
date datetime,
version datetime,
value1 float,
value2 float,
value3 float,
value4 float,

I T-SQL i would select MAX(version) and group by ID, Date, then join in the values with cross apply.

select
  B.*
from
(
   Select
     ID,
     Date,
     MAX(Version)
     From DailyValues D1
   group by
     ID,
     Date
) as A
CROSS APPLY (
   select top 1 *
   from DailyValues D2
   where D1.ID = D2.ID
   and D1.Date = D2.Date
   and D1.Version = D2.version
   order Version desc
) as B

The file is to big for me to do it in T-sql.

How can i do this in U-sql

1

1 Answers

3
votes

You can first extract the CSV file into a rowset. After, extracting, you can select the latest version, as given below:

@DailyValues = 
      EXTRACT ID int,
    date datetime,
    version datetime,
    value1 float,
    value2 float,
    value3 float,
    value4 float
       FROM "/Samples/Data/DailyValues.csv"
       USING Extractors.Csv(encoding: Encoding.[ASCII]);

SELECT ID, Date, Version
FROM
(
SELECT ID, Date,Version, ROW_Number() OVER(PARTITION BY ID, Date ORDER BY version DESC) AS rn
FROM @DailyValues) AS t
WHERE t.rn == 1;