1
votes

I have what may seem like a strange question (but details of why I'm asking will become clear soon).

Consider fitting a linear model in R, as follows:

lm_fit <- lm(mpg ~ cyl+disp, data = mtcars).

Now suppose we produce a tidy tibble (named out_summ) of the summary of our fit using the amazing broom package as follows:

out_summ <- broom::tidy(lm_fit)
out_summ
#> # A tibble: 3 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)  34.7       2.55       13.6  4.02e-14
#> 2 cyl          -1.59      0.712      -2.23 3.37e- 2
#> 3 disp         -0.0206    0.0103     -2.01 5.42e- 2

Created on 2021-02-02 by the reprex package (v0.3.0)

Now consider just printing the summary of the lm_fit object to the console as follows:

summary(lm_fit)
#> 
#> Call:
#> lm(formula = mpg ~ cyl + disp, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.4213 -2.1722 -0.6362  1.1899  7.0516 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 34.66099    2.54700  13.609 4.02e-14 ***
#> cyl         -1.58728    0.71184  -2.230   0.0337 *  
#> disp        -0.02058    0.01026  -2.007   0.0542 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.055 on 29 degrees of freedom
#> Multiple R-squared:  0.7596, Adjusted R-squared:  0.743 
#> F-statistic: 45.81 on 2 and 29 DF,  p-value: 1.058e-09

Created on 2021-02-02 by the reprex package (v0.3.0)

Now, my question is given the lm_fit object and the tidy out_summ tibble as inputs, is it possible write a function to easily reproduce the printed summary(lm_fit) table shown above?

I looked at the getAnywhere(print.summary.lm) for the lm summary function and it is quite complicated. In the sense that it sources individual columns and manually adjusts spacing using cat() to pretty print. As such given the tidy tibbles, can this be easily reverse engineered?

Motivation: I'm developing a statistical modeling object similar to lm, however we are using tidy tibble formats from the start. We would still like to add a summary method to our object. This would take our tidy tibble format and print it out using the format of the summary(lm_fit) output style above. That is, we don't want to just print the tibble, without first making it look like the classical summary table format.

Any help on achieving this, with hopefully minimal code (given the tidy input), would be appreciated.

1
The source code for summary.lm (line 261: github.com/SurajGupta/r-source/blob/master/src/library/stats/R/…) shows you can generate all required output for your summary table from just the lm.fit object (i.e. you don't need the tibble to create your output). Is there a reason you can't adapt this code to suit your 'lm-like' object?jared_mamrot
@jared_mamrot - thanks. As mentioned in my code above, what you have linked to is the same code as getAnywhere(print.summary.lm) I mentioned above. I'm using lm as an analogy here. Basically in the new statistical modeling object that I've created (like lm, but different) all of my output is in already in tidy tibble format. I want to now also have a summary looking table for my new object. But I already have a tibble. So the situation is like reverse engineering lm summary using the tidy tibble output from an lm object. I would then apply this code to my setting. Does that clarify?user4687531
So my question is, how can we easily go from a tibble to the print.summary.lm source code you have pointed to using the manual and complicated nested cat() statements? As you can see the script is very manually constructed. I'm asking if there is an elegant way to get this summary table directly from the tibble output, in a much cleaner way (with correct column spacing etc). If it helps, as a thought experiment, just assume lm was written with broom::tidy summary attributes. And someone asks to produce the summary output for lm using this tidy output, how would you do it?user4687531
Ahh - that makes more sense - I'll look into itjared_mamrot
thanks for your interest @jared_mamrot. I should clarify - I'm most interested in pretty printing the coefficients part of the summary output (which is what broom::tidy gives you), but the rest of the table would be nice to see constructed as well :)user4687531

1 Answers

0
votes

Not sure how to fill in the rest of the summary table without using the lm_fit object (or equivalent), but maybe these 'first steps' will help.

library(tidyverse)

lm_fit <- lm(mpg ~ cyl+disp, data = mtcars)

summarise_lm_like_object <- function(lm_fit){
  out_summ <- broom::tidy(lm_fit) %>%
    mutate(sig = ifelse(p.value <= 0.001, "***",
                        ifelse(p.value <= 0.01, "**",
                               ifelse(p.value <= 0.05, "*", ".")))) %>% 
    rename("Estimate" = estimate,
           "Std. Error" = std.error,
           "t value" = statistic,
           "Pr(>|t|)" = p.value,
           "Significance" = sig)
  
  print.data.frame(out_summ, row.names = FALSE)
  cat("---\n")
  cat("Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1")
}

summarise_lm_like_object(lm_fit)
#>         term    Estimate Std. Error   t value     Pr(>|t|) Significance
#>  (Intercept) 34.66099474 2.54700388 13.608536 4.022869e-14          ***
#>          cyl -1.58727681 0.71184427 -2.229809 3.366495e-02            *
#>         disp -0.02058363 0.01025748 -2.006696 5.418572e-02            .
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Created on 2021-02-03 by the reprex package (v1.0.0)