1
votes

The stargazer package in R is fantastic for displaying multiple regression models as side-by-side columns—the standard style for many social science disciplines. However, the package doesn't play well with knitr+pandoc, since it generates output either as HTML or TeX, but not Markdown.

As a solution, I'm creating a function that can generate tables similar to those created with stargazer, but that saves the output as a simple data frame, which I can then render with pacakges like kable and pander in knitted documents. Doing this is trivial with functions like broom::tidy.

I'm stuck, however, on how to order the coefficients displayed in the model. Take these three models, for example. When displayed with stargazer, the final coefficient order is c("wt", "qsec", "hp", "cyl", "gear", "carb", "drat"). The order of all the coefficients is based primarily on the coefficients in the first model (wt, qsec, cyl, gear, carb). When the second model is appended as a new column, the hp row is inserted after qsec and before cyl.

lm0 <- lm(hp ~ wt + qsec + cyl + gear + carb, mtcars)
lm1 <- lm(qsec ~ hp + cyl + gear + carb, mtcars)
lm2 <- lm(qsec ~ wt + hp + gear + drat, mtcars)

stargazer(lm0, lm1, lm2, type="text")
====================================================
              (1)           (2)             (3)          
----------------------------------------------------
wt           16.879                       0.827**        
            (12.113)                      (0.383)        
qsec         -8.124                                      
            (6.109)                                      
hp                         -0.005        -0.026***       
                          (0.007)         (0.004)        
cyl         18.210**     -0.811***                       
            (8.785)       (0.280)                        
gear         13.342      -1.597***         -0.232        
            (15.115)      (0.441)         (0.439)        
carb         9.277         0.098                         
            (6.345)       (0.222)                        
drat                                       0.099         
                                          (0.636)        
Constant     49.424      29.181***       19.530***       
           (171.876)      (2.398)         (2.766)        
====================================================
Note:                    *p<0.1; **p<0.05; ***p<0.01

In the end, I hope to generate a character vector of the coefficient names that I can then use with dplyr::arrange() to correctly sort a data frame of multiple model coefficients.

The sorting seems to follow this pseudo algorithm:

  1. Save the first list (list_1) of coefficient names
  2. Go through the second list of names. If element_1 of list_2 doesn't match element_1 of list_1, check the next element of list_1 until there's a match, then insert before the match
  3. Put element_2 of list_2 after element_1 if it doesn't match anything else in list_1
  4. Repeat with list_3, and so on

Writing simple R code to generate this order, however, has proven more difficult than I thought. Simply concatenating all the coefficient names in a vector and then keeping only unique values doesn't product the correct order, since new variables (like hp) are just added to the end of existing variable names instead of getting inserted in the middle:

library(tidyverse)
names1 <- names(lm0$coefficients) %>% discard(~ .x == "(Intercept)")
names2 <- names(lm1$coefficients) %>% discard(~ .x == "(Intercept)")
names3 <- names(lm2$coefficients) %>% discard(~ .x == "(Intercept)")

# New variables just appended
unique(c(names1, names2, names3))

# [1] "wt"   "qsec" "cyl"  "gear" "carb" "hp"   "drat"

Additionally, it seems like the only way to implement something like this is to use a ton of loops, which feels wildly inefficient.

So, in the end, how can I sort or reorder a character vector of coefficient names by order of appearance in a list of models, prioritizing the order of the first model in the list? That is, ultimately this is the character vector I'd like to get: c("wt", "qsec", "hp", "cyl", "gear", "carb", "drat")


Update: memisc::mtable(lm0, lm1, lm2) is a neat alternative to stargazer that actually returns a data frame (and not just text), but it doesn't insert new coefficients in the already existing order and instead appends them to the list (with hp and drat at the end). It seems to just concatenate all the coefficient names and use their unique values.

===================================================
                     lm0        lm1        lm2     
---------------------------------------------------
  (Intercept)       49.424   29.181***  19.530***  
                  (171.876)  (2.398)    (2.766)    
  wt                16.879               0.827*    
                   (12.113)             (0.383)    
  qsec              -8.124                         
                    (6.109)                        
  cyl               18.210*  -0.811**              
                    (8.785)  (0.280)               
  gear              13.342   -1.597**   -0.232     
                   (15.115)  (0.441)    (0.439)    
  carb               9.277    0.098                
                    (6.345)  (0.222)               
  hp                         -0.005     -0.026***  
                             (0.007)    (0.004)    
  drat                                   0.099     
                                        (0.636)    
---------------------------------------------------
1

1 Answers

1
votes

To answer the Q of the OP

So, in the end, how can I sort or reorder a character vector of coefficient names by order of appearance in a list of models, prioritizing the order of the first model in the list?

here is a one-liner which should work for an arbitary number of models:

unique(names(unlist(lapply(list(lm0, lm1, lm2), coef))))[-1]
#[1] "wt"   "qsec" "cyl"  "gear" "carb" "hp"   "drat"

Note that the code makes the implicit assumption that the first model always has "(Intercept)" as first coefficient which is removed from the result vector by negative indexing [-1].

If this can't be guaranteed it might be safer to use

setdiff(unique(names(unlist(lapply(list(lm0, lm1, lm2), coef)))), "(Intercept)")

to remove "(Intercept)" from the result vector if there is any and irrespectively of its position. The remaining coefficient names will maintain their order:

#[1] "wt"   "qsec" "cyl"  "gear" "carb" "hp"   "drat"

Edit

It's not quite clear which logic stargazer has implemented to order the coefficients. However, according to the help page, stargazer also returns the same output invisibly as a character vector. In addition, the table.layout parameter can be utilized to only return the coefficient section. This can be used to extract the names of the coefficient in the same order as stargazer:

sgt <- capture.output(stargazer::stargazer(lm0, lm1, lm2, type="text", table.layout = "t"))
setdiff(stringr::str_extract(sgt, "^\\w*"), c("", "Constant"))
#[1] "wt"   "qsec" "hp"   "cyl"  "gear" "carb" "drat"

As stargazer uses cat() to output, capture.output() keeps the console output clean (Thanks to @Andrew for suggesting this).

The regular expression in str_extract() returns the first "word" at the beginning of each string. The result vector is again cleaned up using setdiff().