I'm trying to do a very simple linear regression analysis on a few variables in my dataset, and finding that R and SAS are outputting very different values for its model fits. I am attempting to regress
spending ~ tenure (in months)
In SAS, my code looks like
proc reg data=model_data;
model spending = tenure;
output out=&outfile r=resid stdi=stdi_metric;
title 'SAS model';
run; quit;
In R, I am using the following code:
modelobject <- lm(spending ~ tenure, data = df)
predictions <- predict(modelobject, interval = "prediction", se.fit = TRUE, level = 1 - alpha)
However, what I see is that the residuals in R (and therefore the fitted coefficient and intercept terms) are very different than in SAS. I am not including them here since it's confidential data, but suffice to say they don't match. They DO match, though, when I change my SAS code to
proc reg data=model_data;
model spending = tenure;
output out=&outfile r=resid stdp=stdp_metric; * <-- this is the only change!
title 'SAS model';
run; quit;
I get the same residuals and coefficients here. Why is this the case? From my understanding, stdp and stdi are the standard errors associated with confidence and prediction intervals (see these lecture notes). However, switching between a confidence and prediction interval shouldn't theoretically change your model's fit (this is especially true in R since you're passing in the same modelobject
into your predict()
function).
So why do the SAS residuals change when the stdi
metric is switched to stdp
? Moreover, this question is being asked in the broader context of a project where I am attempting to convert old SAS macros into R- how can I replicate the same model fit in R (with SAS' PROC REG
using stdi
)?
I have also consulted the SAS manuals on definitions of these metrics and PROC REG, and cannot find anything regarding why model fit implementation changes when stdi
is changed to stdp
.