2
votes

I'm trying to do a very simple linear regression analysis on a few variables in my dataset, and finding that R and SAS are outputting very different values for its model fits. I am attempting to regress

spending ~ tenure (in months)

In SAS, my code looks like

proc reg data=model_data;
model spending = tenure;
output out=&outfile r=resid stdi=stdi_metric;
title 'SAS model';
run; quit;

In R, I am using the following code:

modelobject <- lm(spending ~ tenure, data = df)
predictions <- predict(modelobject, interval = "prediction", se.fit = TRUE, level = 1 - alpha) 

However, what I see is that the residuals in R (and therefore the fitted coefficient and intercept terms) are very different than in SAS. I am not including them here since it's confidential data, but suffice to say they don't match. They DO match, though, when I change my SAS code to

proc reg data=model_data;
model spending = tenure;
output out=&outfile r=resid stdp=stdp_metric; * <-- this is the only change!
title 'SAS model';
run; quit;

I get the same residuals and coefficients here. Why is this the case? From my understanding, stdp and stdi are the standard errors associated with confidence and prediction intervals (see these lecture notes). However, switching between a confidence and prediction interval shouldn't theoretically change your model's fit (this is especially true in R since you're passing in the same modelobject into your predict() function).

So why do the SAS residuals change when the stdi metric is switched to stdp? Moreover, this question is being asked in the broader context of a project where I am attempting to convert old SAS macros into R- how can I replicate the same model fit in R (with SAS' PROC REG using stdi)?

I have also consulted the SAS manuals on definitions of these metrics and PROC REG, and cannot find anything regarding why model fit implementation changes when stdi is changed to stdp.

2

2 Answers

1
votes

STDI is the standard error of the individual predicted value whereas STDP is standard error of the mean predicted value.

So in order to resolve this issue you need to use se.fit=F in predict() function and you should get the exactly similar result as you are getting from your SAS code which uses STDI option (currently in your R code se.fit = TRUE so it's using standard error of predicted means while predicting the outcome which is equivalent to STDP option in SAS). Hope this helps!


Don't forget to let us know if it solved your problem :)

0
votes

Figured out what the issue was for me. You actually have to scroll down in the regression output window, because the latest results are further down the window. Good rule of thumb in SAS I learned- always check if there is additional output and if you are looking at the latest results. This, combined with the fact that I had a syntax error in my macro parameters that led to me fit two y targets at the same time, was causing my error:

enter image description here