0
votes

I do have the following data in SAS. The variable strdat shows some values such as 99979797 for the second or higher order observations for the same id. I want to replaced these observations using the variable enddat from the previous observations + 7 days. For example, for id=2, the second observation strdat must change to 20050808.

Original Data

data data_b;
input id obs strdat enddat
;
datalines;
1 1 20061130 20081215 
2 2 99979797 20070905 
2 1 20010215 20050801 
3 3 20100115 20101231 
3 2 20060203 20070601
3 1 20041115 20050202
4 3 99979797 20101231
4 2 20060101 20060205
4 1 20050810 20051010
5 1 20041201 20050614
5 2 20060105 20060301
5 3 20061015 20090606
run;

Desired Output

1 1 20061130 20081215 
2 2 20050808 20070905 
2 1 20010215 20050801 
3 3 20100115 20101231 
3 2 20060203 20070601
3 1 20041115 20050202
4 3 20060212 20101231
4 2 20060101 20060205
4 1 20050810 20051010
5 1 20041201 20050614
5 2 20060105 20060301
5 3 20061015 20090606
2
Please post what you've tried along with your expected output. If you have extraneous variables not relevant to your questions please remove them as well.Reeza

2 Answers

1
votes

First change the invalid dates to missing, by reading them as SAS date, and order by ID, OBS. Then you can use UPDATE to fill the missing start date with previous end date + 7. I will leave it to you consult the documentation on how UPDATE treats missing values and why this works.

data b;
   input id obs (strdat enddat)(:??yymmdd.);
   format strdat enddat yymmddd10.;
   datalines;
1 1 20061130 20081215 
2 2 99979797 20070905 
2 1 20010215 20050801 
3 3 20100115 20101231 
3 2 20060203 20070601
3 1 20041115 20050202
4 3 99979797 20101231
4 2 20060101 20060205
4 1 20050810 20051010
5 1 20041201 20050614
5 2 20060105 20060301
5 3 20061015 20090606
   run;
proc sort;
   by id obs;
   run;
data c;
   update b(obs=0) b;
   by id;
   output;
   strdat = enddat + 7;
   run;

enter image description here

0
votes

Save the second-latest enddat for each id:

proc sql;
    create table data_b2 as select distinct
        id, max(enddat) + 7 as new_enddat
        from data_b (where = (strdat ~= 99979797))
        group by id;
quit;

Reattach them to the main table, replacing strdat where appropriate:

proc sql;
    create table data_b3 as select distinct
        a.id, a.obs,
        case when a.strdat = 99979797 then b.new_enddat else a.strdat end as strdat,
        a.enddat
        from data_b as a
        left join data_b2 as b
        on a.id = b.id
        order by a.id, a.obs desc;
quit;