This is my first post, so please let me know if I'm not clear enough. Here's what I'm trying to do - this is my dataset. My approach for this is a do loop with a lag but the result is rubbish.
data a;
input @1 obs @4 mindate mmddyy10. @15 maxdate mmddyy10.;
format mindate maxdate date9.;
datalines;
1 01/02/2013 01/05/2013
2 01/02/2013 01/05/2013
3 01/02/2013 01/05/2013
4 01/03/2013 01/06/2013
5 02/02/2013 02/08/2013
6 02/02/2013 02/08/2013
7 02/02/2013 02/08/2013
8 03/10/2013 03/11/2013
9 04/02/2013 04/22/2013
10 04/10/2013 04/22/2013
11 05/04/2013 05/07/2013
12 06/10/2013 06/20/2013
;
run;
Now, I'm trying to produce a new column - "Replacement" based on the following logic:
- If a record's mindate occurs before its lag's maxdate, it cannot be a replacement for it. If it cannot be a replacement, skip forward (so- 2,3,4 cannot replace 1, but 5 can).
- Otherwise... if the mindate is less than 30 days, Replacement = Y. If not, replacement = N. Once a record replaces another (so, in this case, 5 does replace 1, because 02/02/2013 is <30 than 01/05/2013, it cannot duplicate as a replacement for another record. But if it's an N for one record above, it can still be a Y for some other record. So, 6 is now evaluated against 2, 7 against 3,etc. Since those two combos are both "Y", 8 is now evaluated versus 4, but because its mindate >30 relative to 4's maxdate, it's a N. But, it's then evaluated against against
- And so on...
I should that in a 100 record dataset, this would imply that the 100th record could technically replace the 1st, so I've been trying lags within loops. Any tips/help is greatly appreciated! Expected output:
obs mindate maxdate Replacement
1 02JAN2013 05JAN2013
2 02JAN2013 05JAN2013
3 02JAN2013 05JAN2013
4 03JAN2013 06JAN2013
5 02FEB2013 08FEB2013 Y
6 02FEB2013 08FEB2013 Y
7 02FEB2013 08FEB2013 Y
8 10MAR2013 11MAR2013 Y
9 02APR2013 22APR2013 Y
10 10APR2013 22APR2013 N
11 04MAY2013 07MAY2013 Y
12 10JUN2013 20JUN2013 Y