1
votes

Short version: Is it possible (aside from reverse processing the formatted cell contents) to extract the coefficients from a table reporting the results of regression models generated using the finalfit package?

Background: Using the (amazing) finalfit package I can produce tables of results from regression models. I want to report some of these same results in the text of an Rmarkdown document. I do not want to run the regression models twice, once in finalfit for the tables and once to generate output to be used in text. Also, finalfit processes the coefficients (e.g. exponentiates coefficients from logistic regression models to generate odds ratios, consistently formats decimal points) and I do not want to duplicate these steps.

This code below produces a formatted table of results from a logistic regression model (note: deliberately using base R code for model):

library(finalfit)
library(dplyr)
explanatory = c("age", "sex.factor")
dependent = "mort_5yr"
colon_s %>%
  ## Crosstable
  summary_factorlist(dependent, explanatory, fit_id=TRUE)  %>% 
  ff_merge(
    glm(
      mort_5yr ~ age + sex.factor, family="binomial", data = colon_s
    ) %>% 
  fit2df(estimate_suffix=" (multivariable)")
  ) %>% 
  select(-c(fit_id, index)) %>% 
  dependent_label(colon_s, dependent)

I cannot see how I can extract the odds ratio for Sex:Male from this table (or the pipeline to produce it), without running and processing the glm model separately.

Extracting the cell contents directly (as suggested by @LyzandeR) results in this string: "0.98 (0.76-1.27, p=0.888)" The relevant coefficients, confidence intervals and P-value would need to be extracted. This nearly achieves the result, but is not ideal given that they have all previously been calculated and concatentated into this string.

Note: I am happy to achieve the desired result using a different package.

2
If I understand correctly you can add %>% select("OR (multivariable)") %>% slice(3)LyzandeR
The finalfit package processes the coefficients and adds in brackets etc to make the table look nice. If I process cells from the formatted table I will then need to strip off the formatting again, which is relatively easy to do but seems a bit cumbersome and not generalisable to other regression models or cell contents (ie. the summary columns in the above example).ChrisP

2 Answers

1
votes

Thanks for your interest in this package.

It's a good thought and not something we have implemented as you describe. You can use the internal functions (which are exported) to do this.

Not that pretty but will give you a table of coefficients, 95% CI and p-values in your pipeline.

library(finalfit)
library(dplyr)
explanatory = c("age", "sex.factor")
dependent = "mort_5yr"
colon_s %>%
    summary_factorlist(dependent, explanatory, fit_id=TRUE)  %>% 
    ff_merge(
        glmmulti(colon_s, dependent, explanatory)[[1]] %>%  # glmmulti/glm etc. will work
            extract_fit() %>% 
            {coef_table <<- .} %>%  # save an extra table in the pipeline
            condense_fit(estimate_suffix = " (multivariable)") %>% 
            remove_intercept()
    ) %>% 
    select(-c(fit_id, index)) %>% 
    dependent_label(colon_s, dependent)

Just in case you hadn't see it, the bare models without formatting can be generated thus:

explanatory = c("age", "sex.factor")
dependent = "mort_5yr"
colon_s %>% 
    finalfit(dependent, explanatory, condense = FALSE)
1
votes

Tweaking @Ewen answer for my own preferences:

library(finalfit)
library(dplyr)
explanatory = c("age", "sex.factor")
dependent = "mort_5yr"
colon_s %>%
  ## Crosstable
  summary_factorlist(dependent, explanatory, fit_id=TRUE)  %>% 
  ff_merge(
    glm(
      mort_5yr ~ age + sex.factor, family="binomial", data = colon_s
    ) %>% 
      fit2df(condense = FALSE) %>%  
      {coef_multi <<- .} %>% # generate a table of raw coefficients here
      condense_fit(estimate_suffix=" (multivariable)")
  ) %>% 
  select(-c(fit_id, index)) %>% 
  dependent_label(colon_s, dependent)

Note: this approach can also be used for the coefficients from univariate models

The only sticking point is getting the raw summary stats, but really that's not tricky!