Mean across each element of a tibble list-column by group with purrr and dplyr

Question

I'm trying to get used to using tidyverse. I don't know if my data is well suited for using functions like map(). I like the organization of list-columns so I am wondering how to use a combination of group_by(), summarize(), map(), and other functions to get this to work. I know how to use these functions with vector-columns but do not know how to approach this in the case of list-columns.

Sample data:

library(tidyverse)

set.seed(3949)
myList <- replicate(12, sample(1:20, size = 10), simplify = FALSE)

tibble(
  group = rep(c("A", "B"), each = 6),
  data = myList
)

Each vector in the list-column has ten elements which are values for a given trial. What I would like to do is group the tibble by group and then find the "column" mean and se of the expanded lists. In other words, it's like I'm treating the list columns as a matrix with each row of the tibble bound together. The output will have columns for the group and trials as well so it is in the correct format for ggplot2.

        mean        se group trial
1   6.000000 1.6329932     A     1
2  12.666667 2.3333333     A     2
3  12.333333 2.8007935     A     3
4  13.833333 1.8150605     A     4
5   8.166667 3.1028661     A     5
6  11.500000 2.9410882     A     6
7  13.666667 2.3758040     A     7
8   6.833333 1.7779514     A     8
9  11.833333 2.3009660     A     9
10  8.666667 1.7061979     A    10
11  8.333333 1.6865481     B     1
12 12.166667 2.6002137     B     2
13 10.000000 2.7080128     B     3
14 11.833333 3.1242777     B     4
15  4.666667 1.2823589     B     5
16 12.500000 3.0413813     B     6
17  6.000000 1.5055453     B     7
18  8.166667 1.6616591     B     8
19 11.000000 2.6708301     B     9
20 13.166667 0.9457507     B    10

Here is how I would normally do something like this:

set.seed(3949)

data.frame(group = rep(c("A", "B"), each = 6)) %>%
  cbind(replicate(12, sample(1:20, size = 10)) %>% t()) %>%
  split(.$group) %>%
  lapply(function(x) data.frame(mean = colMeans(x[ ,2:11]),
                                se = apply(x[ ,2:11], 2, se))) %>%
  do.call(rbind,.) %>%
  mutate(group = substr(row.names(.), 1,1),
         trial = rep(1:10, 2)) %>% 

  ggplot(aes(x = trial, y = mean)) +
  geom_point() +
  geom_line() +
  facet_grid(~ group) +
  scale_x_continuous(limits = c(1,10), breaks = seq(1, 10, 1)) +
  geom_errorbar(aes(ymin = mean-se, ymax = mean+se), color = "black") + 
  theme_bw()

Is there are cleaner way to do this with the tidyverse functions?

DJV DJV · Accepted Answer · 2018-06-01T21:52:29

I think that another way is to use nest() and map().

library(tidyverse)
library(plotrix) #For the std.error

# Your second sample dataset
set.seed(3949)
df <- data.frame(group = rep(c("A", "B"), each = 6)) %>%
  cbind(replicate(12, sample(1:20, size = 10)) %>% t()) 


df %>% 
  nest(-group) %>% 
  mutate(mean = map(data, ~rowMeans(.)), 
         se = map(data, ~ plotrix::std.error(t(.))), 
         trial = map(data, ~ seq(1, nrow(.)))) %>%
  unnest(mean, se, trial) %>% 
  ggplot(aes(x = trial, y = mean)) +
  geom_point() +
  geom_line() +
  facet_grid(~ group) +
  geom_errorbar(aes(ymin = mean-se, ymax = mean+se), color = "black") + 
  theme_bw()

Mean across each element of a tibble list-column by group with purrr and dplyr

1 Answers