summarise and group_by using two different columns consecutively

Question

I have a dataframe df with three columns a,b,c.

df <- data.frame(a = c('a','b','c','d','e','f','g','e','f','g'), 
                 b = c('X','Y','Z','X','Y','Z','X','X','Y','Z'),
                 c = c('cat','dog','cat','dog','cat','cat','dog','cat','cat','dog'))
df

# output
   a b   c
1  a X cat
2  b Y dog
3  c Z cat
4  d X dog
5  e Y cat
6  f Z cat
7  g X dog
8  e X cat
9  f Y cat
10 g Z dog

I have to group_by using the column b followed by summarise using the column c with counts of available values in it.

df %>% group_by(b) %>% 
  summarise(nCat = sum(c == 'cat'), 
            nDog = sum(c == 'dog')) 
#output
# A tibble: 3 × 3
       b  nCat  nDog
  <fctr> <int> <int>
1      X     2     2
2      Y     2     1
3      Z     2     1

However, before doing the above task, I should remove the rows belonging to a value in a which has more than one value in b.

df %>% group_by(a) %>% summarise(count = n())
#output
# A tibble: 7 × 2
       a count
  <fctr> <int>
1      a     1
2      b     1
3      c     1
4      d     1
5      e     2
6      f     2
7      g     2

For example, in this dataframe, all the rows having value e(values: Y,X), f(values: Z,Y), g(values: X,Z) in column a.

# Expected output
# A tibble: 3 × 3
       b  nCat  nDog
  <fctr> <int> <int>
1      X     1     1
2      Y     0     1
3      Z     1     0

akrun akrun · Accepted Answer · 2017-04-26T12:37:06

We can use filter with n_distinct to filter the values in 'b' that have only one unique element for each 'a' group, then grouped by 'b', we do the summarise

df %>%
   group_by(a) %>% 
   filter(n_distinct(b)==1)  %>% 
   group_by(b) %>% 
   summarise(nCat =sum(c=='cat'), nDog = sum(c=='dog'), Total = n())
# A tibble: 3 × 4
#      b  nCat  nDog Total
#  <fctr> <int> <int> <int>
#1      X     1     1     2
#2      Y     0     1     1
#3      Z     1     0     1

summarise and group_by using two different columns consecutively

1 Answers