1
votes

I'm trying to write a SAS macro for work that will take in:

  1. a data set
  2. a list of quantitative independent variables
  3. a list of categorical (some binary) independent variables; and
  4. one dependent variable

The macro will then fit a simple univariate model for each of the variables listed in ##2-3 and report certain information about each model, as well as the number of missing values.

I've got the macro working except for one detail: SAS seems to be deleting parts of the strings when I merge two tables into a final table at the end.

The macro fits models for the quantitative independent variables first, store the necessary model information in a table, and then do the same for the categorical variables.

The last step is to stack these two tables on top of each other. Unfortunately, when I do this, part of the string for the list of parameters is disappearing. Here are some pics to help illustrate:

This is what part of the output looks like for the table with just the models based on quantitative independent variables

enter image description here

This is what part of the output looks like for the table with just the models based on categorical independent variables. Notice, under "Parameter" IT'S LISTING THE FULL NAME OF THE DUMMY VARIABLES

enter image description here

But when I go to merge/stack the tables, SAS is deleting everything after "Value" in the "Parameter" column for the dummy variables for the categorical variables. The information about the dummy variables is necessary, otherwise the portion of the table summarizing the univariate models based on categorical variables is mostly meaningless.

enter image description here

Here is a portion of my code. I'm sorry, but I can't include any data:

ODS output
NObs=Num_Obs_cat /*Output data on missing observations*/
parameterestimates=parameter_est_cat /*Output each model's parameter estimates*/
OverallANOVA=ANOVA_cat /*Output each model's ANOVA table for the p-value for the model's F-test*/;
proc glm data=long_format_cat;
class Value;
by X_ind;
model &depend=Value/SOLUTION EFFECTSIZE CLPARM;
run;
ODS output close;

/*Create a data set of all of the parameter estimates, with only the necessary columns*/
data parameter_est_cat1 (keep=X_ind parameter estimate probT LowerCL UpperCL rename=probT=p_value);
set parameter_est_cat;
run;

/*Create a data set of each model's F-test p-value*/
data ANOVA_cat1 (keep=X_ind ProbF);
set ANOVA_cat;
where not missing(ProbF); 
run;

/*Create a data set of of each model's missing observations*/
data Num_Obs_cat1 (keep=X_ind NObsUsed);
set Num_Obs_cat;
where Label='Number of Observations Used';
run;

/*Merge the three preceding data sets horizontally*/
data univ_summary_stats_cat;
merge parameter_est_cat1 Num_Obs_cat1 long_format_cat5 ANOVA_cat1;
by X_ind;
run;

/*Merge all of the tables together.  THIS IS THE STEP WHERE I BELIEVE IT'S GOING WRONG.
'univ_summary_stats_quant' has the same columns as 'univ_summary_stats_cat' so they
should be able to be merged/set one on top of each other.  But when I do this, parts of those
strings disappear*/

data univ_all;
set univ_summary_stats_quant univ_summary_stats_cat;
run ;

%mend

Please help!

1

1 Answers

3
votes

First, check your log. You may have this warning for your character variables:

WARNING: Multiple lengths were specified for the variable ...

Check the length of your character strings in univ_summary_stats_cat. Your set statement has univ_summary_stats_quant first, which may have shorter variable lengths. This means when the PDV is initialized, SAS will grab the length of all variables in that dataset first. If univ_summary_stats_cat has same-named variables, they will be truncated to the length in univ_summary_stats_quant. To illustrate this, consider the two datasets below:

data one;
    var = 'Hello';
run;

data two;
    var = 'Hi';
run;

data three;
    set two
        one
    ;
run;

var in three will be truncated to "He" since var in two has a length of two and was specified as the first dataset. Note that the log also has a warning.

To solve this, you can manually specify lengths with the length statement prior to your set statement, or you can set the table with the longest string length first.

data three;
    set one 
        two
    ;
run;

If the issue is not at the set statement but rather is at the merge statement, the same logic applies.