1
votes

I'm trying to take all pairs of variables in the mtcars data set and make a linear model using the lm function. But my approach is causing me to lose the formulas when I go to summarize or plot the models. Here is the code that I am using.

library(tidyverse)
my_vars <- names(mtcars)) 
pairs <- t(combn(my_vars, 2)) # Get all possible pairs of variables

# Create formulas for the lm model
fmls <- 
  as.tibble(pairs) %>%
  mutate(fml = paste(V1, V2, sep = "~")) %>%
  select(fml) %>%
  .[[1]] %>%
  sapply(as.formula)

# Create a linear model for ear pair of variables
mods <- lapply(fmls, function(v) lm(data = mtcars, formula = v))

# print the summary of all variables 
for (i in 1:length(mods)) {
  print(summary(mods[[i]]))
}

(I snagged the idea of using strings to make formulas from here [1]: Pass a vector of variables into lm() formula.) Here is the output of the summary for the first model (summary(mods[[1]])):

Call:
lm(formula = v, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
cyl          -2.8758     0.3224   -8.92 6.11e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

I'm searching for a (perhaps metaprogramming) technique so that the call line looks something like lm(formula = var1 ~ var2, data = mtcars) as opposed to formula = v.

1

1 Answers

1
votes

I made pairs into a data frame, to make life easier:

library(tidyverse)
my_vars <- names(mtcars) 
pairs <- t(combn(my_vars, 2)) %>% 
  as.data.frame# Get all possible pairs of variables

You can do this using eval() which evaluates an expression.

listOfRegs <- apply(pairs, 1, function(pair) {
  V1 <- pair[[1]] %>% as.character
  V2 <- pair[[2]] %>% as.character
  fit <- eval(parse(text = paste0("lm(", pair[[1]] %>% as.character,
                                  "~",  pair[[2]] %>% as.character,
                                  ", data = mtcars)")))
  return(fit)
})

lapply(listOfRegs, summary)

Then:

> lapply(listOfRegs, summary)
[[1]]

Call:
lm(formula = mpg ~ cyl, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
cyl          -2.8758     0.3224   -8.92 6.11e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

 ... etc