0
votes

I have a data request on a dataset with more than 50MM records to add some data fields based on date and timestamps

  Dataset that I have

  ID    CODE    ISSUEDATE       ISSUETIME     ORDERDATE    ORDERTIME       QTY
  101   A51     2020-08-24      11:24:00      2020-08-21   09:25:00       100
  101   777     2020-08-21      08:30:00      2020-08-21   08:30:00       125
  101   888     2020-08-21      09:30:00      2020-08-21   09:30:00       145
  102   A51     2020-08-23      11:24:00      2020-08-21   09:25:00       100
  102   256     2020-08-20      08:30:00      2020-08-20   08:30:00       125
  102   256     2020-08-20      11:24:00      2020-08-20   11:24:00       145

I need to pull the data for CODE='A51'

 I want the following Dataset

  ID    CODE    ISSUEDATE       ISSUETIME     ORDERDATE    QTY   ISSUEDATE2  ISSUUTIME2   QTY2
  101   A51     2020-08-24      11:24:00      2020-08-21   100    2020-08-21  08:30:00    125
  102   A51     2020-08-23      11:24:00      2020-08-21   100    2020-08-20  08:30:00    125
  1. I need to create ORDERDATE2, ISSUETIME2, and QTY2 variables based on ORDERDATE and ISSUETIME of code value 'A51' for each ID.

  2. ORDERDATE2 and ORDERTIME2 will look for ORDERDATE and ORDERTIME and pull the closest ISSUEDATE and ISSUETIME and QTY details from non 'A51' rows

In the above example - ID 101 has ORDERDATE as '2020-08-21' and ORDERTIME as 09:25 - therefore most recent non 'A51' record on 2020-08-21 for this ID is 08:30:00 with QTY 125

  1. If there is no entry for ORDERDATE then the most recent to ISSUEDATE should be captured In the above example - ID 102 has ORDERDATE as '2020-08-21' but there is no ISSUEDATE row for '2020-08-21' hence the closest to this date is ('2020-08-20', 08:30:00, and 125) captured.

I am a novice to Teradata Qualify statements and all that I am doing is create separate datasets for A51 and not A51 and join them on ID and time

    Create table _A51_ as Select ID, ISSUEDATE, ISSUETIME, ORDERDATE, ORDERTIME, QTY from
    Have where CODE='A51'

    Create table _Non_A51_ as Select ID, ISSUEDATE, ISSUETIME, ORDERDATE, ORDERTIME, QTY from
    Have where CODE ne 'A51'

    Create table _A51B_ as Select 
     A.*,B.ISSUEDATE as ISSUEDATE2, B.ISSUETIME as ISSUETIME2
     from _A51_ as A
     inner join _Non_A51_  as B
     where A.ID=B.ID and 
     A.ORDERDATE=B.ISSUEDATE
     and A.ORDERTIME le B.ISSUETIME
    QUALIFY ROW_NUMBER() OVER (PARTITION BY A.ID, A.ISSUEDATE ORDER BY B.ISSUETIME)=1

This wouldn't give me the rows for ID 102 - I hope there should be an easy way without me creating two datasets and join them back.

Any help here is much appreciated.

I don't think you need QUALIFY, it seems more like LAG. But for ID 102, why wouldn't the 2020-08-20 11:24:00 row be chosen as "closest"? For non-A51 rows, are Issue Date/Time always the same as Order Date/Time? - Fred
Also, what if you had two A51 in a row? Should OrderDate2, OrderTime2, Qty2 be NULL? - Fred
Hi Fred.. My bad. you are right. For ID 102, 2020-08-20 11:24:00 row be chosen as "closest". Issue Date/Time can be different and they are not always the same. If I have two A51 in a row with the same issue date and time then both carry the same values. Thanks - ckp