
I am trying to create a simple summary statistics table (min, max, mean, n, etc) that handles both factor variables and continuous variables, even when there is more than one factor variable. I'm trying to produce good looking HTML output, eg stargazer or huxtable output.

For a simple reproducible example, I'll use mtcars but change two of the variables to factors, and simplify to three variables.


mtcars_df <- mtcars
mtcars_df <- mtcars_df %>% 
  mutate(vs = factor(vs),
         am = factor(am)) %>% 
  select(mpg, vs, am)

So the data has two factor variables, vs and am. mpg is left as a double:

#>    mpg vs am
#>  <dbl> <fctr> <fctr>
#> 1 21.0  0  1
#> 2 21.0  0  1
#> 3 22.8  1  1
#> 4 21.4  1  0
#> 5 18.7  0  0
#> 6 18.1  1  0

My desired output would look something like this (format only, the numbers aren't all correct for am0):

Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
mpg       32 20.091  6.027   10    15.4     22.8   34 
vs0       32 0.562   0.504    0     0        1      1 
vs1       32 0.438   0.504    0     0        1      1 
am0       32 0.594   0.499    0     0        1      1 
am1       32 0.406   0.499    0     0        1      1 

A straight call to stargazer does not handle factors (but we have a solution for summarising one factor, below)

# this doesn't give factors
stargazer(mtcars_df, type = "text")
Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
mpg       32 20.091  6.027   10    15.4     22.8   34 

This previous answer from @jake-fisher works very well to summarise one factor variable. https://stackoverflow.com/a/26935270/8742237

The code below from the previous answer gives both values of the first factor vs, i.e. vs0 and vs1 but when it comes to the second factor, am, it only lists summary statistics for one value of am:

  • am0 is missing.

I do realise that this is because we want to avoid the dummy variable trap when modeling, but my issue is not about modeling, it's about creating a summary table with all values of all factor variables.

options(na.action = "na.pass")  # so that we keep missing values in the data
X <- model.matrix(~ . - 1, data = mtcars_df)
X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")

Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
mpg       32 20.091  6.027   10    15.4     22.8   34 
vs0       32 0.562   0.504    0     0        1      1 
vs1       32 0.438   0.504    0     0        1      1 
am1       32 0.406   0.499    0     0        1      1 

While use of stargazer or huxtable would be preferred, if there's an easier way to produce this sort of summary table with a different library, that would still be very helpful.

How will you calculate summary stats for factor variables?Ronak Shah
@RonakShah I'm hoping to expand and one-hot encode all of the factors, as in the examples above, eg vs0, vs1, so that mean will show what proportion of vs is ==0 and ==1. For factors with more values, I'd be thinking to create more dummies, eg from mtcars: cyl4, cyl6, cyl8Jeremy K.
have you tried skimr?Bruno
Not html format, but epiDisplay::codebook(mtcars_df) gives appropriate summaries of numeric and factors.Edward
gtsummary might be helpful and has both gt and huxtable outputdash2

1 Answers


In the end, instead of using model.matrix(), which is designed to drop the base case when creating dummy variables, a simple fix is to use mlr::createDummyFeatures(), which creates a Dummy for all values, even the base case.


mtcars_df <- mtcars
mtcars_df <- mtcars_df %>% 
  mutate(vs = factor(vs),
         am = factor(am)) %>% 
  select(mpg, vs, am)

X <- mlr::createDummyFeatures(obj = mtcars_df)
X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")

which does give the desired output:

Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
mpg       32 20.091  6.027   10    15.4     22.8   34 
vs.0      32 0.562   0.504    0     0        1      1 
vs.1      32 0.438   0.504    0     0        1      1 
am.0      32 0.594   0.499    0     0        1      1 
am.1      32 0.406   0.499    0     0        1      1 