I have a dataset with repeated observation (five observations in example)
There are also multiple rows per individual (not same for everyone)
ID val1 val2 val3 val4 val5
1 0 0 1 1 1
1 . . 2 2 2
2 1 1 1 . .
2 2 2 2 . .
3 1 1 . . .
Step 1 : I apply transpose procedure
proc transpose data=original out=tranposed;
by ID; var val1-val&_valcount; /* in this example &_valcount = 5 */
run;
ID val col1 col2
1 val1 0 .
1 val2 0 .
1 val3 1 2
1 val4 1 2
1 val5 1 2
2 val1 1 2
2 val2 1 2
2 val3 1 2
2 val4 . .
2 val5 . .
3 val1 1 .
3 val2 1 .
3 val3 . .
3 val4 . .
3 val5 . .
Step 2 : nmiss() to delete rows with missing values only
data transposed; set transposed;
if nmiss(of _numeric_) EQ &_repeatcount then delete;
/* in this example &_repeatcount = 2 */
run;
ID val col1 col2
1 val1 0 .
1 val2 0 .
1 val3 1 2
1 val4 1 2
1 val5 1 2
2 val1 1 2
2 val2 1 2
2 val3 1 2
3 val1 1 .
3 val2 1 .
Because original dataset is large, size of transposed dataset is very big and it takes long computation time to get transposed dataset.
Is there any methods to combine Step 1 & Step 2 into single step like deleting observations while transpose dataset to save storage and time?
WHERE=()
dsn option on the output dataset from PROC TRANSPOSE, but you need to know how many columns are going to be generated since WHERE clause does not support variable lists.out=want(where=(N(col1,col2)))
– Tom