running a list through dplyr group_by and summarising & mutating each time

Question

Is it possible to loop through a list and replace the group_by variable when using dplyr? Let me illustrate:

Lets say I have a list of variables from the dataset myData each of the variables has the same groups 1 through 10. Ideally I'd like to loop through the list and for each variable summarise and mutate the data as indicated below. Is this possible?

Here is a smaller generalized example but I just put the variable a in the group_by function but ideally i'd like to loop through a list and get that output for each variable.

vars <- list(a,b,c)

> myData
   success a b c
1        0 2 1 3
2        1 1 3 1
3        1 1 3 1
4        0 1 1 3
5        1 2 2 1
6        1 2 3 2
7        0 2 2 3
8        0 1 1 3
9        0 2 3 2
10       1 1 1 2
11       1 1 2 2
12       0 1 1 1
13       0 3 1 1
14       1 3 2 1

> myData %>% group_by(a) %>% 
+     summarise(success = sum(success), n = n()) %>% 
+     mutate(success_prop = success / sum(n))
Source: local data frame [3 x 4]

  a success n success_prop
1 1       4 7   0.28571429
2 2       2 5   0.14285714
3 3       1 2   0.07142857

final results might look something like this:

group   a.success   a.n a.success_prop  b.success   b.n b.success_prop  c.success   c.n c.success_prop
1         4          7  0.28571429          1        6  0.07142857          4         6   0.2857143
2         2          5  0.14285714          3        4  0.21428571          3         4   0.2142857
3         1          2  0.07142857          3        4  0.21428571          0         4   0

Can you add dput(myData) to your post so that your example is reproducible? — davechilders
I changed the question to have a more general example so it is reproducible — moku
The first line of your desired output has a.success = 4, a.n = 7, and a.success_prop = 0.28. This seems inconsistent. — davechilders

davechilders davechilders · Accepted Answer · 2015-02-17T21:01:48

I would recommend converting your data in a tidy format as a first step:

library(tidyr)
library(dplyr)

tidy_data <- myData %>%
  gather(key, value, a:c)

It is then straightforward to use group_by and summarise.

Edit

tidy_data %>%
  group_by(key, value) %>%
  summarise(
    success = sum(success),
    n = n()
  ) %>%
  group_by(key) %>%
  mutate(
    success_prop = success / sum(n)
  )

running a list through dplyr group_by and summarising & mutating each time

1 Answers