4
votes

i would like to know if it is possible to transpose efficiently from wide to long using proc sql in sas.

I'm aware that proc transpose is much quicker that the method i suggest below. But one of my objective would be to avoid storing the transposed table.

Let's say for example, that i have table1 as

Id|   A|   B|   C|  D    
_____________________
 1|  100|3500|6900| 10300
 2|  200| 250| 300| 350
 3|  150|  32| 400| 204
 4|  200| 800|1400| 2000

and i want to turn it into

id|col1|  col2|
______________
 1|   A|   100|
 1|   B|  3500|
 1|   C|  6900|
 1|   D| 10300|
 2|   A|   200|
 2|   B|   250|
 2|   C|   300|
 2|   D|   350|
 3|   A|   150|
 3|   B|    32|
 3|   C|   400|
 3|   D|   204|
 4|   A|   200|
 4|   B|   800|
 4|   C|  1400|
 4|   D|  2000|

I could do this;

select id, 'A' as col1, A as col2
from table1
where A ~=""
union select id, 'B' as col1, B as col2
from table1
where B ~=""
etc

but it is highly inefficient.

Any idea? Thanks.

2
The UNION will remove any duplicates, if you don't care if there are duplicates, then you could use UNION ALL which may help with performance.Taryn♦
Looks very much like a SQL unpivot, but wiring that into PROC SQL might be difficult.Philip Kelley

2 Answers

8
votes

If you're in SAS, use PROC TRANSPOSE for this option. There is no particularly good way to do this in PROC SQL; while many SQL variants have their own way to pivot data, SAS has PROC TRANSPOSE and expects you to use it.

The SAS datastep also does this very efficiently, perhaps even better than PROC TRANSPOSE. Here's an example, including creating a view as noted in the comments.

data want/view=want;
set have;
array vars a b c d;                  *array of your columns to transpose;
do _t = 1 to dim(vars);              *iterate over the array (dim(vars) gives # of elements);
  if not missing(vars[_t]) then do;  *if the current array element's value is nonmissing;
    col1=vname(vars[_t]);            *then store the variable name from that array element in a var;
    col2=vars[_t];                   *and store the value from that array element in another var;
    output;                          *and finally output that as a new row;
  end;
end;
drop a b c d _t;                     *Drop the old vars (cols) and the dummy variable _t;
run;
1
votes

I actually did something just like this today. Try doing this,

proc transpose data = ORIGINAL_DATA;
        out = NEW_DATA;
    by id;
    VAR A-D;
run;

I think this should work.