0
votes

I have a dataset with X number of categorical variables for a given record. I would like to somehow turn this dataset into a new dataset with dummy variables, but I want to have one command / macro that will take the dataset and make the dummy variables for all variables in the dataset.

I also dont want to specify the name of each variable, because I could have a dataset with 50 variables so it would be too cumbersome to have to specify each variable name.

Lets say I have a table like this, and I want the resulting table, with the above conditions that I want a single command or single macro without specifying each individual variable:

enter image description here

3

3 Answers

4
votes

You can use PROC GLMSELECT to generate the design matrix, which is what you are asking for.

data test;
   input id v1 $ v2 $ v3 $ ;
   datalines;
1 A A A
2 B B B
3 C C C
4 A B C
5 B A A
6 C B A
;

proc glmselect data=test outdesign(fullmodel)=test_design noprint ;
   class v1 -- v3;
   model id = v1 -- v3 /selection=none noint;
run;

You can use the -- to specify all variables between the first and last. Notice I don't have to type v2. So if you know first and the last, you can get want you want easily.

0
votes

I prefer GLMMOD myself. One note, if you can, CLASS variables are usually a better way to go, but not supported by all PROCS.

/*Run model within PROC GLMMOD for it to create design matrix
Include all variables that might be in the model*/
proc glmmod data=sashelp.class outdesign=want outparm=p;
    class sex age;
    model weight=sex age height;
run;

/*Create rename statement automatically
THIS WILL NOT WORK IF YOUR VARIABLE NAMES WILL END UP OVER 32 CHARS*/
data p;
    set p;

    if _n_=1 and effname='Intercept' then
        var='Col1=Intercept';
    else
        var=catt("Col", _colnum_, "=", catx("_", effname, vvaluex(effname)));
run;

proc sql ;
    select var into :rename_list separated by " " from p;
quit;

/*Rename variables*/
proc datasets library=work nodetails nolist;
    modify want;
    rename &rename_list;
    run;
quit;

proc print data=want;
run;

Originally from here and the post has links to several other methods. https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Variables/ta-p/308484

0
votes

Here is a worked example using your simple three observation dataset and a modified version of the PROC GLMMOD method posted by @Reeza

First let's make a sample dataset with a long character ID variable. We will introduce a numeric ROW variable that we can later use to merge the design matrix back with the input data.

data have;
   input id :$21. education_lvl $ income_lvl $ ;
   row+1;
datalines;
1 A A
2 B B
3 C C
;

You could set the list of variables into a macro variable since we will need to use it in multiple places.

%let varlist=education_lvl income_lvl;

Use PROC GLMMOD to generate the design matrix and the parameter list that we will later use to generate user friendly variable names.

proc glmmod data=have outdesign=design outparm=parm noprint;
    class &varlist;
    model row=&varlist / noint ;
run;

Now let's use the parameter list to generate rename statement to a temporary text file.

filename code temp;
data _null_;
  set parm end=eof;
  length rename $65 ;
  rename = catx('=',cats('col',_colnum_),catx('_',effname,of &varlist));
  file code ;
  if _n_=1 then put 'rename ' ;
  put @3 rename ;
  if eof then put ';' ;
run;

Now let's merge back with the input data and rename the variables in the design matrix.

data want;
  merge have design;
  by row ;
%inc code / source2;
run;