1
votes

I have a dataset with repeated observation (five observations in example)

There are also multiple rows per individual (not same for everyone)

ID val1 val2 val3 val4 val5
1  0    0    1    1    1
1  .    .    2    2    2
2  1    1    1    .    .
2  2    2    2    .    .
3  1    1    .    .    .

Step 1 : I apply transpose procedure

proc transpose data=original out=tranposed;
by ID; var val1-val&_valcount; /* in this example &_valcount = 5 */
run;    

ID val  col1 col2
1  val1 0    .
1  val2 0    .
1  val3 1    2
1  val4 1    2
1  val5 1    2
2  val1 1    2
2  val2 1    2
2  val3 1    2
2  val4 .    .
2  val5 .    . 
3  val1 1    .
3  val2 1    .
3  val3 .    .
3  val4 .    .
3  val5 .    .

Step 2 : nmiss() to delete rows with missing values only

data transposed; set transposed;
if nmiss(of _numeric_) EQ &_repeatcount then delete; 
/* in this example &_repeatcount = 2 */
run; 

ID val  col1 col2 
1  val1 0    .
1  val2 0    .
1  val3 1    2
1  val4 1    2
1  val5 1    2
2  val1 1    2
2  val2 1    2
2  val3 1    2
3  val1 1    .
3  val2 1    .

Because original dataset is large, size of transposed dataset is very big and it takes long computation time to get transposed dataset.

Is there any methods to combine Step 1 & Step 2 into single step like deleting observations while transpose dataset to save storage and time?

1
Do you know the "repeatcount" in advance? Is it the same for every group?Tom
Yes I know "repeatcount" in advance. It isn't same for every group. I just know 'max' repeatcountjaerung
Shouldn't there be a way to do this using a data set option - WHERE?Reeza
@Reeza Is it possible to do this with 'where' option within single step? In the very first dataset (wide format), how can it be applied to delete missing observations? Thank you for your sincere opinion.jaerung
You can use a WHERE=() dsn option on the output dataset from PROC TRANSPOSE, but you need to know how many columns are going to be generated since WHERE clause does not support variable lists. out=want(where=(N(col1,col2)))Tom

1 Answers

0
votes

So if you know the maximum number of observations (rows) per ID group and number of variables (columns) your want to transpose

%let maxrows=2;
%let maxcols=5;
%let incols=val1-val5 ;
%let outcols=col1-col&maxrows;

Then you can use a temporary array to store all of the values for an ID group and then output it.

data want ;
  array storage (&maxrows,&maxcols) _temporary_;
  do n=1 by 1 until (last.id);
    set have;
    by id ;
    array inx &incols;
    do j=1 to dim(inx);
       storage(n,j)=inx(j);
    end;
  end;
  length _name_ $32 ;
  array out &outcols;
  do row=1 to dim(inx);
    _name_ = vname(inx(row));
    do col=1 to n;
      out(col) = storage(col,row);
    end;
    if n(of out(*)) then output;
  end;
  keep id _name_ &outcols;
run;