2
votes

I have panel data that looks something like this:

ID      year    dummy
1234    2007    0
1234    2008    0
1234    2009    0
1234    2010    1
1234    2011    1
2345    2008    0
2345    2009    1
2345    2010    1
2345    2011    1
3456    2008    0
3456    2009    0
3456    2010    1
3456    2011    1

With more observations following the same pattern and many more variables that aren't relevant to this problem.

I want to establish a treatment sample of IDs where the dummy variable "switches" at 2010 (is 0 when year<2010 and 1 when year>=2010). In the example data above, 1234 and 3456 would be in the sample and 2345 would not.

I'm fairly new to SAS and I guess I'm not familiar enough with CLASS and BY statements to figure out how to do this.

So far I've done this:

data c_temp;
    set c_data_full;
    if year < 2010 and dummy=0
        then trtmt_grp=1;
    else pre_grp=0;
    if year >=2010 and dummy=1
        then trtmt_grp=1;
run;

But that doesn't do anything about the panel aspect of the data. I can't figure out how to do the last step of selecting only the IDs where trtmt_grp is 1 for every year.

All help is appreciated! Thanks!

3
Do you just want the list of patient IDs or all of the data for those patients? - Tom
Right now I just want a count of the number of IDs in the treatment group. A way to identify them and print them out in the future would be useful, though! - MsTiggy

3 Answers

3
votes

Don't think you need double DoW loop, unless you need to append the data to the other rows. Simple single pass should suffice if you just need a single row per ID that matches.

data want;
  set have;
  by id;
  retain grpcheck;   *keep its value for multiple passes;
  if first.id and year < 2010 then grpcheck=1;  *reset for each ID to 1 (kept);
  else if first.id and year ge 2010 then grpcheck=0;
  if (year<2010) and (dummy=1) then grpcheck=0;  *if a non-zero is found before 2010, set to 0;
  if (year >= 2010) and (dummy=0) then grpcheck=0; *if a 0 is found at/after 2010, set to 0;
  if last.id and year >= 2010 and grpcheck=1;  *if still 1 by last.id and it hits at least 2010 then output;
run;

Any time you want to do some logic for each ID (or, each logically grouped set of rows by some variable's value), you start by setting your flag/etc. in an if first.id statement group. Then, modify your flag as appropriate for each row. Then, add an if last.id group which checks to see if the flag is still set when you've hit the last row.

1
votes

I think you probably want a double DOW loop. First loop to calculate your TRTMT_GRP flag at the ID level and the second to select the detailed records.

data want ;
  do until (last.id);
    set c_data_full;
    by id dummy ;
    if first.dummy and dummy=1 and year=2010 then trtmt_grp=1;
  end;
  do until (last.id);
    set c_data_full;
    by id ;
    if trtmt_grp=1 then output;
  end;
run;
1
votes

It seems to me that Proc SQL can deliver a pretty straightforward approach,

proc sql;
select distinct id from have
group by id
having sum(year<=2009 and dummy = 1)=0 and sum(year>=2010 and dummy=0) = 0
;
quit;