Intro: Taking the model from one data set and applying to another data set to find an RMSE.
Say, I have dataset "data100"
And run the following selection operation to determine significant variables:
PROC REG DATA =data100;
model y= x0-x999 / selection=forward SLENTRY=.01;
run;quit;
It returns that x0 x10 x20 x30 x40 x50 x60 x70 x80 x90 are significant at <.0001. Ok. Now, I want to use this model in another data set "data1000".
Why couldn't I then just use:
PROC REG DATA =data1000;
model y= x0 x10 x20 x30 x40 x50 x60 x70 x80 x90;
run;quit;
To determine the RMSE of the data1000 set?
The reason this came up is that a mentor told me to use:
proc reg=data100 outest=data100est;
model y= x0-x999;
run;quit;
proc score data=data1000 score=data100est out=data1000p residual type=parms;
var y x0-x999;
run;
proc univariate data=data1000P;
var model1;
output out=data1000stat uss=ss1;
run;
data data1000stat;
set data1000stat;
rmse=sqrt(ss1/1000);
run;
proc print data=data1000stat;
run;quit;
I'm very confused about this point and if anyone can clarify the why or even if proc score is appropriate here, that would be great.