0
votes

I have two SAS datasets that I am trying to merge together with a common by variable (in the mcve below the by variable is record). There is a condition for this merge though and that's what I am having trouble with. I need to merge fileone with filetwo by record, but the rows from filetwo that match with fileone has to meet the following condition: the intervention_date has to be after that records' case_start (and before the next case_start if a record has more than one case_start).

Here's an example of fileone:

data fileone;
    input record case_start :date9. age;
    format case_start date9.;
    datalines;
1 01jun2015 10
1 15jan2016 11
1 19jul2016 11
2 11aug2016 15
;
run;

Here's an example of filetwo:

data filetwo;
    input record intervention_date :date9. county :$10.;
    format intervention_date date9.;
    datalines;
1 24mar2016 Denver 
1 10sep2015 Denver
1 20oct2016 Denver
2 15nov2016 Boulder
2 12dec2016 Boulder
;
run;

To point out a few things, a record from fileone can have multiple case_start dates (see record=1) and for each record and case_start date, a record from filetwo can have multiple intervention_date (see record=2).

In more concrete terms, this is what I am trying to produce:

data what_merge_should_look_like;
    input record case_start :date9. age intervention_date :date9. county :$10.;
    format case_start intervention_date date9.;
    datalines;
1 01jun2015 10 10sep2015 Denver
1 15jan2016 11 24mar2016 Denver
1 19jul2016 11 20oct2016 Denver
2 11aug2016 15 15nov2016 Boulder
2 11aug2016 15 12dec2016 Boulder
; 
run;

Now, something like a simple by merge doesn't work since it's only matching by record:

/* data already sorted on by variable */
data fileone_two;
    merge fileone
          filetwo;
    by record;
run;

Are there any suggestions for merging these two datasets with this date condition? Is this possible from a simple SAS data step, as I am not very familiar with proc sql? I've seen this done with proc sql, but would like to know about the data step.

1

1 Answers

0
votes

You might be able to use SET to interleave the files by date. Then you can retain the values from FILEONE onto the records from FILETWO. Clear the retained values when starting a new value of RECORD.

proc sort data=filetwo; by record intervention_date ; run;

data want ;
  set fileone (in=in1 rename=(case_start=intervention_date age=agex))
      filetwo (in=in2)
  ;
  by record intervention_date ;
  if first.record then call missing(case_start,age);
  format case_start date9.;
  retain case_start age ;
  if in1 then case_start = intervention_date ;
  if in1 then age=agex ;
  if in2 then output;
  drop date agex ;
run;