Using dplyr to create summary proportion table with several categorical/factor variables

Question

I am trying to create one table that summarizes several categorical variables (using frequencies and proportions) by another variable. I would like to do this using the dplyr package.

These previous Stack Overflow discussions have partially what I am looking for: Relative frequencies / proportions with dplyr and Calculate relative frequency for a certain group.

Using the mtcars dataset, this is what the output would look like if I just wanted to look at the proportion of gear by am category:

    mtcars %>%
    group_by(am, gear) %>%
    summarise (n = n()) %>%
    mutate(freq = n / sum(n))

    #   am gear  n      freq
    # 1  0    3 15 0.7894737
    # 2  0    4  4 0.2105263
    # 3  1    4  8 0.6153846
    # 4  1    5  5 0.3846154

However, I actually want to look at not only the gears by am, but also carb by am and cyl by am, separately, in the same table. If I amend the code to:

    mtcars %>%
    group_by (am, gear, carb, cyl) %>%
    summarise (n = n()) %>%
    mutate(freq = n / sum(n))

I get the frequencies for each combination of am, gear, carb, and cyl. Which is not what I want. Is there any way to do this with dplyr?

EDIT

Also, it would be an added bonus if anyone knew of a way to produce the table I want, but with the categories of am as the columns (as in a classic 2x2 table format). Here is an example of what i'm referring to. It is from one of my previous publications. I want to produce this table in R, so that I can output it directly to a word document using RMarkdown:

Is there a reason it has to be done in dplyr? And is one of the 'groups' always the same? (Here it's am) — Heroka
And can you give an example of the table you want? It's certainly possible with some reshaping, but I'm not sure what you're after. — Heroka
There is no super important reason it has to be done in dplyr, except that i'm trying to learn the package very well, so that I have a consistent method available for producing my tables. Another driving reason for dplyr is that it produces a data frame as output, which allows me to use the stargazer package for producing publication worthy tables that I can then output to a word document using RMarkdown. I am, of course, open to alternative methods that others think are better for doing this. — RNB
I am having a similar problem. How did you construct your table in the end @RNB? — Frederick

Gopala Gopala · Accepted Answer · 2016-01-04T13:34:02

With tidyr/dplyr combination, here is how you would do it:

library(tidyr)
library(dplyr)

mtcars %>%
  gather(variable, value, gear, carb, cyl) %>%
  group_by(am, variable, value) %>%
  summarise (n = n()) %>%
  mutate(freq = n / sum(n))

Using dplyr to create summary proportion table with several categorical/factor variables

3 Answers