0
votes

I have a grouped df with different lengths of groups. I want to count y/n events within each group. So if I have the following:

df <- data.frame(group = rep(1:4,times=c(20,10,17,8)),
                 outcome = rep(c("yes","yes","no","yes","no"),times = 11))

I want to summarize this in a way where I can see the frequency of yes and frequency of no in each group. something like:

df %>% group_by(group) %>%
  summarise(freqyes = (. %>% filter(outcome=="yes") %>% n()) / n(),
            freqyes = (. %>% filter(outcome=="no") %>% n()) / n())

except, that doesn't work.

The yes and no per group should add to 100.

Thanks.

1

1 Answers

2
votes

We can count and then calculate proportion by group.

library(dplyr)

df %>% count(group, outcome) %>% group_by(group) %>% mutate(n = n/sum(n) * 100)

#  group outcome   n
#  <int> <fct>   <dbl>
#1     1 no       40  
#2     1 yes      60  
#3     2 no       40  
#4     2 yes      60  
#5     3 no       35.3
#6     3 yes      64.7
#7     4 no       50  
#8     4 yes      50  

In base R, we can use table and prop.table.

prop.table(table(df), 1) * 100

#    outcome
#group       no      yes
#    1 40.00000 60.00000
#    2 40.00000 60.00000
#    3 35.29412 64.70588
#    4 50.00000 50.00000