1
votes

I would like to run many regression models automatically and test this and save fitted and residuals on the original file.

I mean that I would like to test all possible regression models over the data.

For example, X1=X2+X3...and X2=X1+X3...and X3=X1+X2...

And then add fitted values and residual values of each model.

I have a file like this.

test<-data.frame(X1=rnorm(50,mean=50,sd=10),
                 X2=rnorm(50,mean=5,sd=1.5),
                 X3=rnorm(50,mean=200,sd=25))
test$X1[10]<-5
test$X2[10]<-5
test$X3[10]<-530

I run all possible regression models.

varlist <- names(test)

models <- lapply(varlist, function(x) {
    lm(substitute(i~., list(i = as.name(x))), data = data
})

I got fitted and residuals from each regression model.

lapply(models,residuals)
lapply(models, fitted)

However, I would like to save all residuals and fitted values on the original data. Is it possible to make the final data like this?

X1 X2 X3 Residual1 Residual2 Residual3 Fitted1 Fitted2 Fitted3

So that residual1 is from model1, residual2 is from model2, etc.

2

2 Answers

1
votes

Unfortunately, your code under "I run all possible regression models" doesn't work properly, but assuming that this is just an example, how about just column binding your rows to the original dataset by saving lapply(models, residuals) and lapply(models, fitted) as variables? And then loop over the number of columns, binding them one from each variable at a time:

models_residuals <- lapply(models,residuals)
models_fitted <- lapply(models, fitted)
for (i in 1:dim(models_residuals)[2])) {
    cbind(test, models_residuals[,i])
    cbind(test, models_fitted[,i])    
}

Let me know if my idea of what you want is correct!

1
votes

I'm sure it's possible to have more compact code but you can try something like this

set.seed(1)
test <- data.frame(X1 = rnorm(50, mean = 50, sd = 10),
                   X2 = rnorm(50, mean = 5, sd = 1.5),
                   X3 = rnorm(50, mean = 200, sd = 25))
test$X1[10] <- 5
test$X2[10] <- 5
test$X3[10] <- 530


fitted_list <- lapply(names(test), function(x)
                      fitted(lm(as.formula(paste(x, ".", sep = "~")),
                                                data = test)))

resid_list <- lapply(names(test), function(x)
                     resid(lm(as.formula(paste(x, ".", sep = "~")),
                              data = test)))


res <- do.call(cbind, c(fitted_list, resid_list))
res <- cbind(test, res)
names(res) <- paste0(rep(c("X", "Fitted", "Resid"), each = 3), rep(1:3, 3))
str(res)
## 'data.frame':    50 obs. of  9 variables:
##  $ X1     : num  43.7 51.8 41.6 66 53.3 ...
##  $ X2     : num  5.6 4.08 5.51 3.31 7.15 ...
##  $ X3     : num  184 201 177 204 184 ...
##  $ Fitted1: num  52 50.5 52.8 50.3 51.8 ...
##  $ Fitted2: num  5.23 5.17 5.25 5.09 5.18 ...
##  $ Fitted3: num  219 198 225 161 192 ...
##  $ Resid1 : num  -8.28 1.35 -11.2 15.64 1.49 ...
##  $ Resid2 : num  0.367 -1.09 0.264 -1.788 1.97 ...
##  $ Resid3 : num  -34.47 2.75 -47.44 43.11 -8.33 ...