0
votes

I have two SAS datasets of following type

Dataset1, D1 is as follows

ID        Date             Amount
x1        10/12/2015       100
x2                         200  
x2                         150 
x3        10/10/2014       90 
x4                         60 

Dataset, D2 is of the form

ID        Date
x2        10/12/2016
x4        1/1/2016

Dataset D1 can have duplicate values of ID. Dataset D2 has only unique values of ID. Further, D2 consists of only IDs in Dataset D1 which has missing values of variable date (x2 and x4 have date missing in D1). I want to merge D1 with D2 such that the output is as follows

ID        Date             Amount
x1        10/12/2015       100
x2        10/12/2016       200  
x2        10/12/2016       150 
x3        10/10/2014       90 
x4        1/1/2016         60

Is this doable without using proc sql in SAS. Can I use merge?

I tried using the following but to no use (and it should not work either because D1 has duplicate IDs)

data x;
 merge D1 (in=in1) D2(in=in2);
 by ID;
 if in1;
run;
2

2 Answers

1
votes

Is this doable without using proc sql in SAS. Can I use merge?

Yes, any SQL step can be done in a Data step, but may take up more or less code space.

Here is a potential solution:

data DateN DateY;
    set D1;
    if date=. then output step1;
    else output step2;
run;

data merge;
    DateN(keep=ID Amount) D2;
    by id;
run;

data x;
    set merge DateY;
run;

proc sort data=x; 
    by ID;
run;

This assumes that the missing values from D1 have unique ID.

1
votes

DATA step merge of two datasets works fine when only one of the datasets has a unique ID. The problem with your merge is that the DATE variables from each dataset will collide:

231  options msglevel=i;
232  data x;
233   merge D1 D2;
234   by ID;
235   put (ID Date Amount)(=);
236  run;

INFO: The variable Date on data set WORK.D1 will be overwritten by data set WORK.D2.
ID=x1 Date=10/12/2015 Amount=100
ID=x2 Date=10/12/2016 Amount=200
ID=x2 Date=. Amount=150
ID=x3 Date=10/10/2014 Amount=90
ID=x4 Date=01/01/2016 Amount=60

The MSGLEVEL=i option generates the INFO: line in the log which alerts you to the collision. In this case you almost get the results you want, despite the collision. The problem is the third record, where DATE is missing. This is a side-effect of having a collision in a one-to-many merge.

I would suggest you avoid the collision by renaming the DATE variable in each dataset. You can then compute a new DATE variable by using the COALESCE() function, which returns the first value that is not missing:

237  data want;
238    merge d1 (keep=ID Date Amount rename=(Date=Date1))
239          d2 (keep=ID Date rename=(Date=Date2))
240    ;
241    by ID;
242    Date=coalesce(Date1,Date2);
243    put (ID Date1 Date2 Date Amount)(=);
244    format Date mmddyy10.;
245  run;

ID=x1 Date1=10/12/2015 Date2=. Date=10/12/2015 Amount=100
ID=x2 Date1=. Date2=10/12/2016 Date=10/12/2016 Amount=200
ID=x2 Date1=. Date2=10/12/2016 Date=10/12/2016 Amount=150
ID=x3 Date1=10/10/2014 Date2=. Date=10/10/2014 Amount=90
ID=x4 Date1=. Date2=01/01/2016 Date=01/01/2016 Amount=60