0
votes

EDIT!!!! GO TO BOTTOM FOR BETTER REPRODUCABLE CODE!

I have a data set with a quantitative variable that's missing 65 values that I need to impute. I used the ODS output and proc glm to simultaneously fit a model for this variable and predict values:

ODS output
predictedvalues=pred_val; 
proc glm data=Six_min_miss;
class nyha_4_enroll;
model SIX_MIN_WALK_z= nyha_4_enroll kccq12sf_both_base /p solution;
run;
ODS output close;

However, I am missing 21 predicted values because 21 of my observations are missing either of the two independent predictors.

If SAS can't make a prediction because of this missingness, it leaves an underscore (not a period) to show that it didn't make a prediction.

For some reason, if it can't make a prediction, SAS also puts an underscore for the 'observed' value--even if an observed value is present (the value in the highlighted cell under 'observed' should be 181.0512):

enter image description here

The following code merges the ODS output data set with the observed and predicted values, and the original data. The second data step attempts to create a new 'imputed' version of the variable that will use the original observation if it's not missing, but uses the predicted value if it is missing:

data PT_INFO_6MIN_IMP_temp;
merge PT_INFO pred_val;
drop dependent observation biased residual;
run;

data PT_INFO_6MIN_IMP_temp2;
set PT_INFO_6MIN_IMP_temp;
if missing (SIX_MIN_WALK_z) then observed=predicted;
rename observed=SIX_MIN_WALK_z_IMPUTED;
run;

However, as you can see, SAS is putting an underscore in the imputed column, when there was an original value that should have been used:

enter image description here

In other words, because the original variable values is not missing (it's 181.0512) SAS should have taken that value and copied it to the imputed value column. Instead, it put an underscore.

I've also tried if SIX_MIN_WALK_z =. then observed=predicted

Please let me know what I'm doing wrong and/or how to fix. I hope this all makes sense.

Thanks

EDIT!!!!! EDIT!!!!! EDIT!!!!!

See below for a truncated data set so that one can reproduce what's in the pictures. I took only the first 30 rows of my data set. There are three missing observations for the dependent variable that I'm trying to impute (obs 8, 11, 26). There are one of each of the independent variables missing, such that it can't make a prediction (obs 8 & 24). You'll notice that the "_IMP" version of the dependent variable mirrors the original. When it gets to missing obs #8, it doesn't impute a value because it wasn't able to predict a value. When it gets to #11 and #26, it WAS able to predict a value, so it added the predicted value to "_IMP." HOWEVER, for obs #24, it was NOT able to predict a value, but I didn't need it to, because we already have an observed value in the original variable (181.0512). I expected SAS to put this value in the "_IMP" column, but instead, it put an underscore.

data test;
input Study_ID nyha_4_enroll kccq12sf_both_base SIX_MIN_WALK_z;
cards;
01-001 3 87.5 399.288
01-002 4 83.333333333 411.48
01-003 2 87.5 365.76
01-005 4 14.583333333 0
01-006 3 52.083333333 362.1024
01-008 3 52.083333333 160.3248
01-009 2 56.25 426.72
01-010 4 75 .
01-011 3 79.166666667 156.3624
01-012 3 27.083333333 0
01-013 4 45.833333333 0
01-014 4 54.166666667 .
01-015 2 68.75 317.2968
01-017 3 29.166666667 196.2912
01-019 4 100 141.732
01-020 4 33.333333333 0
01-021 2 83.333333333 222.504
01-022 4 20.833333333 389.8392
01-025 4 0 0
01-029 4 43.75 0
01-030 3 83.333333333 236.22
01-031 2 35.416666667 302.0568
01-032 4 64.583333333 0
01-033 4 33.333333333 0
01-034 . 100 181.0512
01-035 4 12.5 0
01-036 4 66.666666667 .
01-041 4 75 0
01-042 4 43.75 0
01-043 4 72.916666667 0
;
run;

data test2;
set test;
drop Study_ID;
run;

ODS output
predictedvalues=pred_val; 
proc glm data=test2;
class nyha_4_enroll;
model SIX_MIN_WALK_z= nyha_4_enroll kccq12sf_both_base /p solution;
run;
ODS output close;


data combine;
merge test2 pred_val;
drop dependent observation biased residual;
run;

data combine_imp;
set combine;
if missing (SIX_MIN_WALK_z) then observed=predicted;
rename observed=SIX_MIN_WALK_z_IMPUTED;
run;
1
If the value is ._ then testing if it is . will obviously fail.Tom
Why are you testing one variable (SIX_MIN_WALK_z) for missing and then changing a different variable (observed)?Tom
Sorry for any confusion. SIX_MIN_WALK_z is the variable with 65 missing observations. The ODS output for the model to predict that variable has the observed and predicted values. The SIX_MIN_WALK_z and the 'observed' values from the model to predict that variable should be identical. I'm using that 'observed' column as a starting point for more complete version of SIX_MIN_WALK_z. Every time this variables is missing, it's also missing in the 'observed' column. But every time it is missing, I want the observed column to have the predicted value, and then I change the column name.immaprogrammingnoob
You are modeling SIX_MIN_WALK_z using the value of nyha_4_enroll and kccq12sf_both_base. If either of those are missing then your model cannot predict a value.Tom
I understand that. The problem is that there are some instances where SAS should be taking the existing value for SIX_MIN_WALK_z and putting it in the imputed column but it's putting an underscore instead (see the last picture in the post). For whatever reason, if it cannot make a prediction, it is claiming that there isn't an observed value, when there actually is one (see first picture). When you compare the two pictures, you can see that observation #25 has an observed value, but SAS seems to be ignoring it and putting an underscoreimmaprogrammingnoob

1 Answers

0
votes

The special missing values (._) mark the observations excluded from the model because of missing values of the independent variables.

Try a simple example:

data class;
  set sashelp.class(obs=10) ;
  keep name sex age height;
  if _n_=3 then age=.;
  if _n_=4 then height=.;
run;

ods output predictedvalues=pred_val; 
proc glm data=class;
  class sex;
  model height = sex age /p solution;
run; quit;
proc print data=pred_val; run;

Since for observation #3 the value of the independent variable AGE was missing in the predicted result dataset the values of observed, predicted and residual are set to ._.

Obs    Dependent    Observation    Biased     Observed       Predicted      Residual

  1     Height           1           0       69.00000000    64.77538462     4.22461538
  2     Height           2           0       56.50000000    58.76153846    -2.26153846
  3     Height           3           1         _              _              _
  4     Height           4           1         .            61.27692308      .
  5     Height           5           0       63.50000000    64.77538462    -1.27538462
  6     Height           6           0       57.30000000    59.74461538    -2.44461538
  7     Height           7           0       59.80000000    56.24615385     3.55384615
  8     Height           8           0       62.50000000    63.79230769    -1.29230769
  9     Height           9           0       62.50000000    62.26000000     0.24000000
 10     Height          10           0       59.00000000    59.74461538    -0.74461538

If you really want to just replace the values of OBSERVED or PREDICTED in the output with the values of the original variable that is pretty easy to do. Just re-combine with the source dataset. You can use the ID statement of PROC GLM to have it include any variables you want into the output. Like

id name sex age height;

Now you can use a dataset step to make any adjustments. For example to make a new height variable that is either the original or predicted value you could use:

data want ;
  set pred_val ;
  NEW_HEIGHT = coalesce(height,predicted); 
run;

proc print data=want width=min;
  var name height age predicted new_height ;
run;

Results:

                                                    NEW_
Obs    Name       Height    Age     Predicted      HEIGHT

  1    Alfred      69.0      14    64.77538462    69.0000
  2    Alice       56.5      13    58.76153846    56.5000
  3    Barbara     65.3       .      _            65.3000
  4    Carol         .       14    61.27692308    61.2769
  5    Henry       63.5      14    64.77538462    63.5000
  6    James       57.3      12    59.74461538    57.3000
  7    Jane        59.8      12    56.24615385    59.8000
  8    Janet       62.5      15    63.79230769    62.5000
  9    Jeffrey     62.5      13    62.26000000    62.5000
 10    John        59.0      12    59.74461538    59.0000