0
votes

I have a very large data set (18 million observations) that I would like to transpose by subsetting based on one variable and creating 900 new variables out of those sub/ets. Example code and desired output format below: Example data:

data long1 ; 
  input famid year faminc ; 
cards ; 
var1 96 40000 
var1 97 40500 
var1 98 41000 
var2 96 45000 
var2 97 45400 
var2 98 45800 
var3 96 75000 
var3 97 76000 
var3 98 77000 
; 
run;

Code:

proc transpose data = data; by famid; var faminc; run;

Desired output format: enter image description here

Because of the size of my data set, I'm wondering if I have use a do loop to find first and last observations of each var and then iteratively subset. Does anyone know a sql or proc to achieve the desired output format?

3
Is your question how to make the proc transpose work or how to do something more efficiently than sorting and proc transposing?Joe
I will comment that going to a wide format may not help, almost always further processing is easier in a long format.Reeza

3 Answers

2
votes

I'm not sure what your subsetting would be based on, but if it's only the id, it is rather straightforward.

Using the example from this ucla's page on proc sort, your example would work fine using slightly modified input data:

data long; 
  input id year faminc ; 
  datalines ; 
1 96 40000 
1 97 40500 
1 98 41000 
2 96 45000 
2 97 45400 
2 98 45800 
3 96 75000 
3 97 76000 
3 98 77000 
; 

proc sort data=long;
  by year;
run;

proc transpose data=long out=wide(drop=_name_) prefix=var; 
  by year; 
  var faminc; 
run;

Results

year var1  var2  var3 
96   40000 45000 75000 
97   40500 45400 76000 
98   41000 45800 77000 

UCLA's SAS help pages are much clearer than SAS's own as regards to using proc transpose in both directions. Here are 4 valuable links...

Long to Wide with Proc Transpose
Wide to Long with Proc Transpose

Long to Wide with Data Step
Wide to Long with Data Step

0
votes

Transpose steps like this are easily done with the array statement. Example for your code would be something like:

data new (keep = year var1-var3);
  set data;
  by year;
  array vars {3} var1-var3;
  retain var1-var3;
  if first.year then i=1;
  else i+1;
  vars{i} = faminc;
  if last.year then output;
run;

But I don't have SAS to double check this. Also see here.

There are no good transpose procedures in PROC SQL.Too limited.

0
votes

Here are links to well tested in battle macros from the book "Data Preparation for Analytics"

From the sas community wiki you can download macro definitions.

Goto http://support.sas.com/kb/32/121.html for transposing from WIDE to LONG and http://support.sas.com/kb/32/122.html for transposing from LONG to WIDE.

More Details on "Data Preparation for Analytics" can be found under http://www.sascommunity.org/wiki/Data_Preparation_for_Analytics, where you can download all macros, programs and datasets for the book.

Hope this help