0
votes

I have problem that I have been trying to solve for a couple of hours now but I simply can't figure it out (I'm new to R btw..).

Basically, what I'm trying to do (using mtcars to illustrate) is to make R test different independent variables (while adjusting for "cyl" and "disp") for the same independent variable ("mpg"). The best soloution I have been able to come up with is:

lm <- lapply(mtcars[,4:6], function(x) lm(mpg ~ cyl + disp + x, data = mtcars))
summary <- lapply(lm, summary)

... where 4:6 corresponds to columns "hp", "drat" and "wt".

This acutually works OK but the problem is that the summary appers with an "x" instead of for instace "hp":

$hp

Call:
lm(formula = mpg ~ cyl + disp + x, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0889 -2.0845 -0.7745  1.3972  6.9183 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.18492    2.59078  13.195 1.54e-13 ***
cyl         -1.22742    0.79728  -1.540   0.1349    
disp        -0.01884    0.01040  -1.811   0.0809 .  
x           -0.01468    0.01465  -1.002   0.3250    
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.055 on 28 degrees of freedom
Multiple R-squared:  0.7679,    Adjusted R-squared:  0.743 
F-statistic: 30.88 on 3 and 28 DF,  p-value: 5.054e-09

Questions:

Is there a way to fix this? And have I done this in the smartest way using lapply, or would it be better to use for instance for loops or other options?

Ideally, I would also very much like to make a table showing for instance only the estimae and P-value for each dependent variable. Can this somehow be done?

Best regards

2

2 Answers

1
votes

One approach to get the name of the variable displayed in the summary is by looping over the names of the variables and setting up the formula using paste and as.formula:

lm <- lapply(names(mtcars)[4:6], function(x) { 
  formula <- as.formula(paste0("mpg ~ cyl + disp + ", x))
  lm(formula, data = mtcars)
})
summary <- lapply(lm, summary)
summary
#> [[1]]
#> 
#> Call:
#> lm(formula = formula, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.0889 -2.0845 -0.7745  1.3972  6.9183 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 34.18492    2.59078  13.195 1.54e-13 ***
#> cyl         -1.22742    0.79728  -1.540   0.1349    
#> disp        -0.01884    0.01040  -1.811   0.0809 .  
#> hp          -0.01468    0.01465  -1.002   0.3250    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.055 on 28 degrees of freedom
#> Multiple R-squared:  0.7679, Adjusted R-squared:  0.743 
#> F-statistic: 30.88 on 3 and 28 DF,  p-value: 5.054e-09

Concerning the second part of your question. One way to achieve this by making use of broom::tidy from the broom package which gives you a summary of regression results as a tidy dataframe:

lapply(lm, broom::tidy)
#> [[1]]
#> # A tibble: 4 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)  34.2       2.59       13.2  1.54e-13
#> 2 cyl          -1.23      0.797      -1.54 1.35e- 1
#> 3 disp         -0.0188    0.0104     -1.81 8.09e- 2
#> 4 hp           -0.0147    0.0147     -1.00 3.25e- 1
0
votes

We could use reformulate to create the formula for the lm

lst1 <- lapply(names(mtcars)[4:6], function(x) {
    fmla <- reformulate(c("cyl", "disp", x), 
       response = "mpg")
    model <- lm(fmla, data = mtcars)
     model$call <- deparse(fmla)
     model
       })

Then, get the summary

summary1 <- lapply(lst1, summary)
summary1[[1]]

#Call:
#"mpg ~ cyl + disp + hp"

#Residuals:
#    Min      1Q  Median      3Q     Max 
#-4.0889 -2.0845 -0.7745  1.3972  6.9183 

#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept) 34.18492    2.59078  13.195 1.54e-13 ***
#cyl         -1.22742    0.79728  -1.540   0.1349    
#disp        -0.01884    0.01040  -1.811   0.0809 .  
#hp          -0.01468    0.01465  -1.002   0.3250    
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 3.055 on 28 degrees of freedom
#Multiple R-squared:  0.7679,   Adjusted R-squared:  0.743 
#F-statistic: 30.88 on 3 and 28 DF,  p-value: 5.054e-09