2
votes

I have a data set in which I must pull out the observations that have been enrolled for 12 continuous months. Each month a subject was enrolled creates a row for that subject. The month variable, monthid looks like this 200901, 200902, 200903, which translates to Jan. 2009, Feb. 2009, March 2009 respectively. See an example of what the data looks like below.

The main question: how do I get SAS to read monthid and pull out all the observations that have 12 continuous months of enrollment, but discard the observations that do not (that either have less than 12, or the months are not continuous)? Notice that in some instances a subject is enrolled over several years and so the numbers will jump from something like 201112 to 201201 - December 2011 to January 2012.

memberid  monthid
1234      200901
1234      200902
1234      200903
5678      201110
5678      201111
5678      201112
5678      201201
1
using proc data it will iterate over a dataset. If you use local variables (e.g., a count, lastmemberid, startmonthid), you can put conditionals if observation the memberid equals last memberid add a count and generate a new set (memberid,monthid_start,monthid_end).vol7ron

1 Answers

3
votes

There are quite a few ways to do this. In SQL, it's a bit complicated, because you have to consider order. However, if you make some assumptions, you should be able to do this fairly easily.

Given the assumption that you only have one row per person (memberid), you can simply identify the 12 month-from-then date, and test how many are in that group.

So for example, you could do something like this to identify for a particular month how many consecutive months a person had:

proc sql;
        select memberID, count(1) from have H
         where monthID ge 200905 and monthID le 201004
         group by memberid
;
quit;

For the overall, you would want to either precreate this table for all possible months, or add this as a subquery to a larger query including all months. This isn't terribly fast, though; likely non-SQL solutions will be faster in SAS, since SAS doesn't have some of the more advanced SQL tools you have in other implementations.

proc sql;
  create table want as 
    select memberID, monthID, count(1)
      from have V
      group by memberID, monthID
      having 12 = (
        select count(1) from have H
        where H.memberID=V.memberID
        and H.monthID le V.monthID
        and H.monthID gt V.monthID-100
        group by H.memberID
    )
;
quit;

If I were to do this in SAS datastep, I would just iterate through the datastep and output if there hadn't been a gap in the last 12 periods.

data want;
 set have;
 by memberID monthID;
 if first.memberID then counter=0;
 if dif(monthID)>1 and mod(monthID,100) ne 1 then counter=0;
 if mod(monthID,100) eq 1 and dif(monthID) ne 89 then counter=0;
 counter+1;
 if counter ge 12 then output;
run;

That takes 0.01 seconds versus 35 on my laptop for the SQL solution. While I suspect there's a much better SAS-SQL solution, the fact that the datastep iterates through rows in a consistent basis means it'll be hard to beat it in SQL, since the datastep can do this in exactly one pass through the data.