I have a dataset that has one row per patient, and it contains information on what date (formatted as a SAS date) a patient took 11 doses of medication. In the dataset, there is at most one dose of medication per day. A patient can have dates populated for anywhere between 1 to 11 doses, and there are no middle doses that are missing information (e.g., if Dose5 is populated, by definition Dose1-Dose4 are populated). I'm interested in getting the maximum number of consecutive days a patient took a dose of medication. Here are 5 sample rows of data.
data have;
input PATIENT_ID Dose1 :ddmmyy10. Dose2 :ddmmyy10. Dose3 :ddmmyy10. Dose4 :ddmmyy10. Dose5 :ddmmyy10. Dose6 :ddmmyy10. Dose7 :ddmmyy10. Dose8 :ddmmyy10. Dose9 :ddmmyy10. Dose10 :ddmmyy10. Dose11;
format Dose1 Dose2 Dose3 Dose4 Dose5 Dose6 Dose7 Dose8 Dose9 Dose10 Dose11 ddmmyy10.;
cards;
1 01/01/2020 01/02/2020 01/03/2020 01/04/2020 01/05/2020 01/06/2020 01/07/20 1/08/2020 01/09/2020 01/10/2020 01/11/2020
2 01/01/2020 01/02/2020 01/03/2020 01/05/2020 01/06/2020
3 01/02/2020 01/04/2020 01/06/2020 01/08/2020 01/10/2020 01/12/2020 01/14/2020 01/16/2020 01/18/2020
4 01/03/2020 01/04/2020 01/05/2020 01/06/2020 01/07/2020 01/08/2020 01/09/2020 1/10/2020 01/12/2020 01/13/2020
5 01/01/2020 01/07/2020 01/08/2020 01/10/2020
;
run;
I would like to get the variable MAX_CONSECUTIVE_DAYS:
data want;
input PATIENT_ID MAX_CONSECUTIVE_DAYS Dose1 :ddmmyy10. Dose2 :ddmmyy10. Dose3 :ddmmyy10. Dose4 :ddmmyy10. Dose5 :ddmmyy10. Dose6 :ddmmyy10. Dose7 :ddmmyy10. Dose8 :ddmmyy10. Dose9 :ddmmyy10. Dose10 :ddmmyy10. Dose11;
format Dose1 Dose2 Dose3 Dose4 Dose5 Dose6 Dose7 Dose8 Dose9 Dose10 Dose11 ddmmyy10.;
cards;
1 11 01/01/2020 01/02/2020 01/03/2020 01/04/2020 01/05/2020 01/06/2020 01/07/20 1/08/2020 01/09/2020 01/10/2020 01/11/2020
2 3 01/01/2020 01/02/2020 01/03/2020 01/05/2020 01/06/2020
3 1 01/02/2020 01/04/2020 01/06/2020 01/08/2020 01/10/2020 01/12/2020 01/14/2020 01/16/2020 01/18/2020
4 8 01/03/2020 01/04/2020 01/05/2020 01/06/2020 01/07/2020 01/08/2020 01/09/2020 1/10/2020 01/12/2020 01/13/2020
5 2 01/01/2020 01/07/2020 01/08/2020 01/10/2020
run;
So far, I've only been able to figure out how to do it by brute force piecemeal.
data bruteforce;
set have;
if Dose2 =. then MAX_CONSECUTIVE_DAYS=1;
else if Dose3=. then
do;
if Dose2-Dose1=1 then MAX_CONSECUTIVE_DAYS=2;
else MAX_CONSECUTIVE_DAYS=1;
end;
else if Dose4=. then
do;
if Dose3-Dose1=2 then MAX_CONSECUTIVE_DAYS=3;
else if (Dose2-Dose1=1) or (Dose3-Dose2=1) then MAX_CONSECUTIVE_DAYS=2;
else MAX_CONSECUTIVE_DAYS=1;
end;
else if Dose5=. then
do;
if Dose4-Dose1=3 then MAX_CONSECUTIVE_DAYS=4;
else if (Dose3-Dose1=2) or (Dose4-Dose2=2) then MAX_CONSECUTIVE_DAYS=3;
else if (Dose2-Dose1=1) or (Dose3-Dose2=1) or (Dose4-Dose3=1) then MAX_CONSECUTIVE_DAYS=2;
else MAX_CONSECUTIVE_DAYS=1;
end;
/*And so on and so forth until accounting for rows where Dose10 is populated*/
run;
However, in my actual work, there are over 200 doses of medication, so doing a series of do loops with if-then-else statements doesn't make sense. If I had to guess, the solution might have something to do with arrays, but I'm not sure where or how to start.