I have a data request on a dataset with more than 50MM records to add some data fields based on date and timestamps
Dataset that I have
ID CODE ISSUEDATE ISSUETIME ORDERDATE ORDERTIME QTY
101 A51 2020-08-24 11:24:00 2020-08-21 09:25:00 100
101 777 2020-08-21 08:30:00 2020-08-21 08:30:00 125
101 888 2020-08-21 09:30:00 2020-08-21 09:30:00 145
102 A51 2020-08-23 11:24:00 2020-08-21 09:25:00 100
102 256 2020-08-20 08:30:00 2020-08-20 08:30:00 125
102 256 2020-08-20 11:24:00 2020-08-20 11:24:00 145
I need to pull the data for CODE='A51'
I want the following Dataset
ID CODE ISSUEDATE ISSUETIME ORDERDATE QTY ISSUEDATE2 ISSUUTIME2 QTY2
101 A51 2020-08-24 11:24:00 2020-08-21 100 2020-08-21 08:30:00 125
102 A51 2020-08-23 11:24:00 2020-08-21 100 2020-08-20 08:30:00 125
I need to create ORDERDATE2, ISSUETIME2, and QTY2 variables based on ORDERDATE and ISSUETIME of code value 'A51' for each ID.
ORDERDATE2 and ORDERTIME2 will look for ORDERDATE and ORDERTIME and pull the closest ISSUEDATE and ISSUETIME and QTY details from non 'A51' rows
In the above example - ID 101 has ORDERDATE as '2020-08-21' and ORDERTIME as 09:25 - therefore most recent non 'A51' record on 2020-08-21 for this ID is 08:30:00 with QTY 125
- If there is no entry for ORDERDATE then the most recent to ISSUEDATE should be captured In the above example - ID 102 has ORDERDATE as '2020-08-21' but there is no ISSUEDATE row for '2020-08-21' hence the closest to this date is ('2020-08-20', 08:30:00, and 125) captured.
I am a novice to Teradata Qualify
statements and all that I am doing is create separate datasets for A51 and not A51 and join them on ID and time
Create table _A51_ as Select ID, ISSUEDATE, ISSUETIME, ORDERDATE, ORDERTIME, QTY from
Have where CODE='A51'
Create table _Non_A51_ as Select ID, ISSUEDATE, ISSUETIME, ORDERDATE, ORDERTIME, QTY from
Have where CODE ne 'A51'
Create table _A51B_ as Select
A.*,B.ISSUEDATE as ISSUEDATE2, B.ISSUETIME as ISSUETIME2
from _A51_ as A
inner join _Non_A51_ as B
where A.ID=B.ID and
A.ORDERDATE=B.ISSUEDATE
and A.ORDERTIME le B.ISSUETIME
QUALIFY ROW_NUMBER() OVER (PARTITION BY A.ID, A.ISSUEDATE ORDER BY B.ISSUETIME)=1
This wouldn't give me the rows for ID 102 - I hope there should be an easy way without me creating two datasets and join them back.
Any help here is much appreciated.
QUALIFY
, it seems more likeLAG
. But for ID 102, why wouldn't the 2020-08-20 11:24:00 row be chosen as "closest"? For non-A51 rows, are Issue Date/Time always the same as Order Date/Time? - Fred