4
votes

I am trying to output some results, including confidence intervals, from many linear models in a tidy tibble, using broom::tidy, but the output only seems to include the confidence interval from the first model.

The linear models have the same predictor but different responses.

Consider the following example:

library(tidyverse)
library(broom)

# Create toy dataframe.

df <- tibble(
  x = sample(100, replace = TRUE),
  y1 = runif(100),
  y2 = rnorm(100)
)


# Fit linear models, each with x as predictor and y1 and y2 respectively as responses.

my_models <- lm(
  cbind(y1, y2) ~ x,
  data = df
)


# Output results as a tidy tibble.

tidy(my_models, conf.int = TRUE)


# Check confidence intervals with other function.

confint(my_models)

The function tidy(my_models, conf.int = TRUE)returns the following:

> tidy(my_models, conf.int = TRUE)
# A tibble: 4 x 8
  response term          estimate std.error statistic       p.value  conf.low conf.high
  <chr>    <chr>            <dbl>     <dbl>     <dbl>         <dbl>     <dbl>     <dbl>
1 y1       (Intercept)  0.370      0.0572      6.47   0.00000000392  0.256      0.483  
2 y1       x            0.00176    0.000949    1.86   0.0663        -0.000121   0.00365
3 y2       (Intercept) -0.0252     0.215      -0.117  0.907          0.256      0.483  
4 y2       x            0.0000574  0.00357     0.0161 0.987         -0.000121   0.00365

Notice that the boundaries for the confidence intervals of both the intercept and x are the two models (or responses). I would expect them to differ.

Compare to the output of the function confint(my_models):

> confint(my_models)
                       2.5 %      97.5 %
y1:(Intercept)  0.2562157921 0.483051716
y1:x           -0.0001209424 0.003646348
y2:(Intercept) -0.4520961653 0.401713738
y2:x           -0.0070326154 0.007147456

Here, the boundaries differ, as expected. And this is the result I expected from tidy(my_models, conf.int = TRUE) as well. Since the boundaries for the model including y1 as response are the same in both functions, I assume tidy only outputs the confidence interval from the first model. So I'm wondering what I am doing wrong here?

1

1 Answers

2
votes

This is a reported issue for multi response linear model in older versions of broom:

library(broom)
packageVersion("broom")
[1] ‘0.5.4’

mod <- lm(cbind(mpg, disp) ~ wt, mtcars)
tidy(mod, conf.int = TRUE)
# A tibble: 4 x 8
  response term        estimate std.error statistic  p.value conf.low conf.high
  <chr>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 mpg      (Intercept)    37.3      1.88      19.9  8.24e-19    33.5      41.1 
2 mpg      wt             -5.34     0.559     -9.56 1.29e-10    -6.49     -4.20
3 disp     (Intercept)  -131.      35.7       -3.67 9.33e- 4    33.5      41.1 
4 disp     wt            112.      10.6       10.6  1.22e-11    -6.49     -4.20

Once you upgrade to the latest version it is ok:

library(broom)
packageVersion("broom")
[1] ‘0.7.0’

mod <- lm(cbind(mpg, disp) ~ wt, mtcars)
tidy(mod, conf.int = TRUE)
# A tibble: 4 x 8
  response term        estimate std.error statistic  p.value conf.low conf.high
  <chr>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 mpg      (Intercept)    37.3      1.88      19.9  8.24e-19    33.5      41.1
2 mpg      wt             -5.34     0.559     -9.56 1.29e-10    -6.49     -4.20
3 disp     (Intercept)  -131.      35.7       -3.67 9.33e- 4  -204.      -58.2
4 disp     wt            112.      10.6       10.6  1.22e-11    90.8     134.