How to iterate over columns with vectorization together with group_by function from dplyr

Question

As explained by Fitting several regression models with dplyr, we can use the tidy function from broom package to run the regression across groups. For instance, a demo code for iris dataset is listed below, but what if, in a simultaneous manner, we intend to loop over the multiple columns and run the regression with different dependent variables (Sepal.Length,Sepal.Width,Petal.Length) together with this group_by manipulation, how can I integrate the (s)apply function into such a situation and get the results for these regression models(3*3=9)?

library(dplyr);library(broom)
res1=iris%>%
group_by(Species)%>%
do(res=lm(Sepal.Length~Petal.Width,data=.))
tidy(res1, res)%>%
filter(term!="(Intercept)")

Andrew Andrew · Accepted Answer · 2019-07-26T12:22:26

You can do this using lme4::lmList and broom.mixed::tidy. You may be able to adapt it to a pipe, but this should get you started. Here, lmList essentially performs the same function as group_by in the dplyr pipe, but it is easier for me to conceptualize how to pipe through several DVs using lapply. Good luck!!

library(lme4)
library(broom.mixed)

# Selecting DVs
dvs <- names(iris)[1:3]

# Making formula objects
formula_text <- paste0(dvs, "~ Petal.Width | Species")
formulas <- lapply(formula_text, formula)

# Running grouped analyses and looping through DVs
results <- lapply(formulas, function(x) {
  res <- broom.mixed::tidy(lmList(x, iris))
  res[res$terms != "(Intercept)",]
})

# Renaming and viewing results
names(results) <- formula_text

And, viewing the results:

results
$`Sepal.Length~ Petal.Width | Species`
# A tibble: 3 x 6
  group      terms       estimate   p.value std.error statistic
  <chr>      <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 setosa     Petal.Width    0.930 0.154         0.649      1.43
2 versicolor Petal.Width    1.43  0.0000629     0.346      4.12
3 virginica  Petal.Width    0.651 0.00993       0.249      2.61

$`Sepal.Width~ Petal.Width | Species`
# A tibble: 3 x 6
  group      terms       estimate    p.value std.error statistic
  <chr>      <chr>          <dbl>      <dbl>     <dbl>     <dbl>
1 setosa     Petal.Width    0.837 0.0415         0.407      2.06
2 versicolor Petal.Width    1.05  0.00000306     0.217      4.86
3 virginica  Petal.Width    0.631 0.0000855      0.156      4.04

$`Petal.Length~ Petal.Width | Species`
# A tibble: 3 x 6
  group      terms       estimate  p.value std.error statistic
  <chr>      <chr>          <dbl>    <dbl>     <dbl>     <dbl>
1 setosa     Petal.Width    0.546 2.67e- 1     0.490      1.12
2 versicolor Petal.Width    1.87  3.84e-11     0.261      7.16
3 virginica  Petal.Width    0.647 7.55e- 4     0.188      3.44

How to iterate over columns with vectorization together with group_by function from dplyr

1 Answers