0
votes

I am trying to transpose data in SAS from a long format to a wide format. The problem I'm having is that I have multiple columns that I'm trying to transpose. I have a few example datasets below to demonstrate what I'm trying to do. The actual dataset I'm doing this on is going to be very large, I think one way to handle this could be to tranpose individual columns and then merge at the end, but the dataset I'm going to be doing this on is going to be significantly larger (tens of thousands of columns), so this will be pretty unfeasible.

Below is the data I'm starting with:

data current_state;
input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;

Below is what I would want the outcome of the transpose to be:

data desired_state;
input id $ d145 $ d213 $ d231 $ d234 $ d269 $ d360 $ d409 $ d435 $ d463 $ d609 $ d678 $ d690;
datalines;
1 0 50 0 50 0 0 0 50 0 0 50 0
2 25 25 0 0 25 25 0 0 0 0 0 25
3 0 0 10 0 0 0 10 0 10 10 0 10
;

I have attempted the following, which isn't giving me the desired output.

proc transpose data=current_state out=test1;
by id;
id att_1 att_2 att_3 att_4 att_5;
var Dollars;
run;
2
You'll need a double transpose here. Once to turn it wide and then a second to transpose to the format desired. Another option is to use methods for dummy variables, that's probably the easiest. Do the values in the matrix have a limit, ie is it D100 to D999? Do you know that list ahead of time?Reeza

2 Answers

0
votes

Here's a fully dynamic way, but it's probably not the most efficient.

  1. Transpose to a long format
  2. Use GLMMOD to create a 0/1 matrix for each value of D
  3. Rename the variables from GLMMOD by the D
  4. Convert 1 to dollar amount - left for you to do.

        data current_state;
    input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
    datalines;
    1 d234 d463 d213 d678 d435 50
    2 d213 d690 d360 d145 d269 25
    3 d409 d231 d463 d690 d609 10
    ;
    
    *transpose to long;
    proc transpose data=current_state out=long;
    by id dollars;
    var att_1-att_5;
    run;
    
    /*Run model within PROC GLMMOD for it to create design matrix
    Include all variables that might be in the model*/
    proc glmmod data=long outdesign=want outparm=p;
    class col1;
    model dollars = col1;
    run;
    
    /*Create rename statement automatically
    THIS WILL NOT WORK IF YOUR VARIABLE NAMES WILL END UP OVER 32 CHARS*/
    data p;
    set p;
    if _n_=1 and effname='Intercept' then var='Col1=Intercept';
    else var= catt("Col", _colnum_, "=", vvaluex(effname));
    run;
    
    proc sql;
    select var into :rename_list separated by " "
    from p;
    quit;
    
    
    /*Rename variables*/
    proc datasets library=work nodetails nolist;
    modify want;
    rename &rename_list;
    run;quit;
    
    
    proc print data=want;
    run;
    

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Variables/ta-p/308484

0
votes

Proc TRANSPOSE does not have syntax for 'parallel pivoting' into a single row -- multiple ids are concatenated to become the resultant id for column name.

A parallel pivot result can be achieved done using serial pivoting - first pivot to a an even taller vector layout, and then to the wide layout.

proc transpose data=current_state out=stage1_vector;
by id Dollars;
var att_1 att_2 att_3 att_4 att_5;
run;

proc transpose data=stage1_vector out=want;
by id;
id col1;
run;

The column order in want is based on the col1 value appearance in the data in the vector layout. You can force a sorted column order by creating a dummy record where the col1 values are in the appearance order you want.

proc transpose data=current_state out=stage1_vector(drop=_name_);
by id Dollars;
var att_1 att_2 att_3 att_4 att_5;
run;

proc sort nodupkey data=stage1_vector(keep=col1) out=ids;
  by col1;
run;

data vector_view / view=vector_view;
  set ids stage1_vector;
run;

proc transpose data=vector_view out=want;
  by id;
  id col1;
run;

The requirement for replacing missing values with zeroes should be considered carefully, especially if you want to count the number of ids with a d* value (Proc MEANS etc). Regardless, the replacement can be done in a post-transpose step.

data want;
  set want;
  array zerome d:;

  /* re-purpose _n_ temporarily as a loop index variable
   * that does not have to be explicitly dropped
   */
  do _n_ = 1 to dim(zerome);
    if missing(zerome(_n_)) then zerome(_n_) = 0;
  end;
run;