SAS merge with duplicate observations

Question

I have two datasets (d1 and d2) that I would like to merge based on an individual specific identifier (ind_ID). The issue is that there are some cases where the same ind_ID appears twice in one or both data sets (with the other fields different).

I am choosing one duplicate in d1 to merge with both duplicates in d2, but I'm not sure how to tell SAS to select just one of the observations in d1 and merge it to both in d2. I am doing it this way because I haven't been told yet which observation is the correct one, and I would like to move forward with my coding until that information comes in.

So far, I have tried merging just based on loan number (first example code) and by using a merge identifier (second example code). It seems that both have the same result, and that SAS is just randomly choosing which ind_ID from the duplicates to merge with which.

Short of manually going through the data and deleting the observations that I don't want to merge, is there a way to do this?

data merge_test;
merge d1 d2;
by ind_ID;
run;

data merge12;
merge d1(in=a) d2(in=b);
by ind_ID;
if a and b;
mergecheck = compress(a||b);
run;

rambles rambles · Accepted Answer · 2014-05-04T16:49:50

If you want to identify the 'correctly' paired rows after the merge, then doing a many-to-many merge now is perhaps your best bet:-

proc sql;

  create table out as
  select d1.*, d2.*
  from d1 full outer join d2
  on d1.ind_id = d2.ind_id;

quit;

I believe that should give all combinations of matching rows.

SAS merge with duplicate observations

3 Answers