0
votes

I have two datasets (d1 and d2) that I would like to merge based on an individual specific identifier (ind_ID). The issue is that there are some cases where the same ind_ID appears twice in one or both data sets (with the other fields different).

I am choosing one duplicate in d1 to merge with both duplicates in d2, but I'm not sure how to tell SAS to select just one of the observations in d1 and merge it to both in d2. I am doing it this way because I haven't been told yet which observation is the correct one, and I would like to move forward with my coding until that information comes in.

So far, I have tried merging just based on loan number (first example code) and by using a merge identifier (second example code). It seems that both have the same result, and that SAS is just randomly choosing which ind_ID from the duplicates to merge with which.

Short of manually going through the data and deleting the observations that I don't want to merge, is there a way to do this?

data merge_test;
merge d1 d2;
by ind_ID;
run;

data merge12;
merge d1(in=a) d2(in=b);
by ind_ID;
if a and b;
mergecheck = compress(a||b);
run;
3

3 Answers

0
votes

If you want to identify the 'correctly' paired rows after the merge, then doing a many-to-many merge now is perhaps your best bet:-

proc sql;

  create table out as
  select d1.*, d2.*
  from d1 full outer join d2
  on d1.ind_id = d2.ind_id;

quit;

I believe that should give all combinations of matching rows.

0
votes

I would add a step prior to the merge that just selects the 'correct' rows from one (or both) of the tables. Then perform the merge as you normally would.

Once you get the correct information, just update this step so that the 'correct' rows are indeed the ones you want to keep.

0
votes

Before you merge the two data sets, you can remove observations with duplicate values for ind_id from d1 using proc sort:

proc sort data=dl out=d1a nodupkey;
    by id;
run;

This outputs table d1a which has no duplicate values for ind_id. Then just merge d1a with d2.