1
votes

I'm trying to run simple linear regressions on a grouped data frame and create a summarised data frame containing the intercept, gradient and R^2 value for each regression. I know I can collect the intercept and gradient for a single regression using lm(formula = var1 ~ var2, data = df)$coefficients[["(Intercept)"]] and lm(formula = var1 ~ var2, data = df)$coefficients[["y"]] respectively, however when I try to combine this with summarise I get the following error:

Error in lm(formula = var1 ~ var2)$coefficients[["y"]] : subscript out of bounds

(The R^2 value calculates just fine). Here is a small reproducible example:

library(dplyr)

## Create dummy data frame
df <- tibble(treatment = factor(c(rep("A", 5), rep("B", 5))),
             var1 = c(1, 4, 5, 7, 2, 8, 9, 1, 4, 7),
             var2 = c(2, 8, 11, 13, 4, 10, 11, 2, 6, 10)) %>%
      group_by(treatment)


reg <- df %>% 
              ## Intercept of linear model
  summarise(intercept = lm(formula = var1 ~ var2)$coefficients[["(Intercept)"]],
              ## Gradient of linear model
            gradient = lm(formula = var1 ~ var2)$coefficients[["y"]],
              ## R^2 value of linear model
            r2 = cor(x = var1, y = var2, use = "complete.obs"))

How do I need to change my code to extract these values successfully for each linear model? Do I need to try a different approach entirely than using summarise?

2
Your model is not does not have a "y" term, it has a "var2" term instead. Substitute in coefficients[["var2"] and it should work. - Dave2e
Thank you @Dave2e - I can't believe I spent such a long time puzzling over this when it was such a simple mistake! - Lucy Wheeler
you can also check out broom to keep your models and get the relevant summary statistics cran.r-project.org/web/packages/broom/vignettes/broom.html - StupidWolf

2 Answers

2
votes

There is some queston regarding exactly what you want but we will assume that since the question refers to "grouped" that you want to run a separate regression for each treatment and that the regression being referred to is var1 vs. var2. If that is not what you want please clarify the question.

  1. within group_by dot does not refer to just the group but rather the entire data frame unless we refer to it within do(...)

  2. always use the coef(...) function rather than $coefficients. Often they give the same result but not always so it is better practice to use coef.

  3. The correlation needs to be squared to get R^2.

1) do Here is the pipeline:

df %>%
  group_by(treatment) %>%
  do({ co <- coef(lm(var1 ~ var2, .))
       summarize(., intercept = co[1], 
                    grad = co[2], 
                    r2 = cor(var1, var2, use = "complete.obs")^2)
  }) %>%
  ungroup

2) do/summary We could alternately use summary(...)$r.squared to get R^2.

df %>%
  group_by(treatment) %>%
  do({ fm <- lm(var1 ~ var2, .)
       co <- coef(fm)
       summarize(., intercept = co[1], 
                    grad = co[2], 
                    r2 = summary(fm)$r.squared)
  }) %>%
  ungroup

3) mulitiple lm or we can just run the lm multiple times:

df %>%
  group_by(treatment) %>%
  summarize(intercept = coef(lm(var1 ~ var2))[1],
            grad = coef(lm(var1 ~ var2))[2],
            r2 = summary(lm(var1 ~ var2))$r.squared) %>%
  ungroup

4) nlme Also note that nlme package (which is a "recommended" package which means it is pre-installed when you install R -- you don't have to install nlme itself but only need to load it using library) supports lmList class:

library(nlme)

fm <- lmList(var1 ~ var2 | treatment, df)
cbind(coef(fm), r.squared = summary(fm)$r.squared)
1
votes

You cannot access the coefficients the way you did by name. Instead try this:

reg <- df %>% 
              ## Intercept of linear model
  summarise(intercept = lm(formula = var1 ~ var2)$coefficients[[1]],
              ## Gradient of linear model
            gradient = lm(formula = var1 ~ var2)$coefficients[[2]],
              ## R^2 value of linear model
            r2 = cor(x = var1, y = var2, use = "complete.obs"))

Update:

It seems you simply misnamed the gradient coefficient. So while my above solution works, you could also simply substitute var2 for y in your code and it works:

reg <- df %>% 
              ## Intercept of linear model
  summarise(intercept = lm(formula = var1 ~ var2)$coefficients[["(Intercept)"]],
              ## Gradient of linear model
            gradient = lm(formula = var1 ~ var2)$coefficients[["var2"]],
              ## R^2 value of linear model
            r2 = cor(x = var1, y = var2, use = "complete.obs"))