0
votes

I have a SAS dataset that looks like this:

id | Date | ...

1    17 Jun

1    19 Jun

2    17 Jun

2    19 Jun

2    21 Jun

3    12 May

each id represents a unique person. I want to keep only 1 row for each unique person, however, still keep the date in dataset. TO achieve this, I need to transform the table into format such as:

id | Date1   |   Date2 | Date 3

1    17 Jun      19 Jun     

2    17 Jun     19 Jun    21 Jun  

3    12 May        

If only 1 date has been assigned to that person, then keep the date2 and date3 as missing value.

The full dataset I'm using contains thousands of observations with over 180 different days. However, a unique person will at most be assigned to 5 different days.

Any help appreciated

2

2 Answers

3
votes

PROC SUMMARY has functionality to do this, using the IDGROUP statement. The code below will transpose the data and create 5 date columns (specified by out[5]), in date order (specified by min(date)). If you want more information on how this works then check the IDGROUP statement in the PROC MEANS / SUMMARY documentation.

data have;
input id Date :date9.;
format date date9.; 
datalines;
1 17Jun2012
1 19Jun2012
2 17Jun2012
2 19Jun2012
2 21Jun2012
3 12May2012
;
run;

proc summary data=have nway;
class id;
output out=want (drop=_:)
        idgroup(min(date) out[5] (date)=);
run;
0
votes

Using Proc Transpose, then using a Data Step (and borrowing Keith's data).

Both ways need the data sorted by ID.

data have;
 input id Date :date9.;
 format date date9.; 
datalines;
1 17Jun2012
1 19Jun2012
2 17Jun2012
2 19Jun2012
2 21Jun2012
3 12May2012
4 01JAN2013
4 02JAN2013
4 03JAN2013
4 04JAN2013
4 05JAN2013
;
run;

proc sort data=have;
 by id;
run;

Proc transpose data=have out=transpose(drop=_name_) prefix=DATE;
 by id;
run;

data ds(drop=cnt date);
 retain date1 date2 date3 date4 date5;
 format date1 date2 date3 date4 date5 mmddyy10.;
 set have;
 by id;
 if first.id then cnt=1;

 select(cnt);
  when(1) date1=date;
  when(2) date2=date;
  when(3) date3=date;
  when(4) date4=date;
  when(5) date5=date;
  otherwise;
 end;

 cnt+1;

 if last.id then do;
  output;
  call missing(of date1-date5);
 end;
run;