2
votes

I am struggling to join two table without creating duplicate rows using proc sql ( not sure if any other method is more efficient).

Inner join is on: datepart(table1.date)=datepart(table2.date) AND tag=tag AND ID=ID

I think the problem is date and different names in table 1. By just looking that the table its clear that table1's row 1 should be joined with table 2's row 1 because the transaction started at 00:04 in table one and finished at 00:06 in table 2. I issue I am having is I cant join on dates with the timestamp so I am removing timestamps and because of that its creating duplicates.

Table1:

id tag    date            amount   name_x
1 23      01JUL2018:00:04  12          smith ltd
1 23      01JUL2018:00:09  12          anna smith



table 2:



id tag  ref   amount   date
1 23   19   12          01JUL2018:00:06:00
1 23   20   12          01JUL2018:00:10:00



Desired output:

id tag    date            amount   name_x       ref
1 23      01JUL2018  12          smith ltd       19
1 23      01JUL2018  12          anna smith      20

Appreciate your help. Thanks!

2

2 Answers

1
votes

You need to set a boundary for that datetime join. You are correct in why you are getting duplicates. I would guess the lower bound is the previous datetime, if it exists and the upper bound is this record's datetime.

As an aside, this is poor database design on someone's part...

Let's first sort table2 by id, tag, and date

proc sort data=table2 out=temp;
by id tag date;
run;

Now write a data step to add the previous date for unique id/tag combinations.

data temp;
set temp;
format low_date datetime20.
by id tag;
retain p_date;

if first.tag then
   p_date = 0;

low_date = p_date;
p_date = date;
run;

Now update your join to use the date range.

proc sql noprint;
create table want as
select a.id, a.tag, a.date, a.amount, a.name_x, b.ref
from table1 as a
  inner join
     temp as b
  on a.id = b.id
  and a.tag = b.tag
  and b.low_date < a.date <= b.date;
quit;
0
votes

If my understanding is correct, you want to merge by ID, tag and the closest two date, it means that 01JUL2018:00:04 in table1 is the closest with 01JUL2018:00:06:00 in talbe2, and 01JUL2018:00:09 is with 01JUL2018:00:10:00, you could try this:

data table1;
input id tag date:datetime21.   amount   name_x $15.;
format date datetime21.;
cards;
1 23 01JUL2018:00:04 12 smith ltd
1 23 01JUL2018:00:09 12 anna smith
;

data table2;
input id tag  ref   amount   date: datetime21.;
format date datetime21.;
cards;
1 23 19 12 01JUL2018:00:06:00
1 23 20 12 01JUL2018:00:10:00
;


proc sql;
   select a.*,b.ref from table1 a inner join table2 b
   on a.id=b.id and a.tag=b.tag
   group by a.id,a.tag,a.date
   having abs(a.date-b.date)=min(abs(a.date-b.date));
quit;