5
votes

Let's say I have the following (simplified) tibble containing a group and values in vectors:

set.seed(1)
(tb_vec <- tibble(group = factor(rep(c("A","B"), c(2,3))),
             values = replicate(5, sample(3), simplify = FALSE)))
# A tibble: 5 x 2
  group values   
  <fct> <list>   
1 A     <int [3]>
2 A     <int [3]>
3 B     <int [3]>
4 B     <int [3]>
5 B     <int [3]>

tb_vec[[1,2]]
[1] 1 3 2

I would like to summarize the values vectors per group by summing them (vectorized) and tried the following:

tb_vec %>% group_by(group) %>% 
  summarize(vec_sum = colSums(purrr::reduce(values, rbind)))

Error: Column vec_sum must be length 1 (a summary value), not 3

The error surprises me, because tibbles (the output format) can contain vectors as well.

My expected output would be the following summarized tibble:

# A tibble: 2 x 2
  group vec_sum  
  <fct> <list>   
1 A     <dbl [3]>
2 B     <dbl [3]>

Is there a tidyverse solution accommodate the vector output of summarize? I want to avoid splitting the tibble, because then I loose the factor.

1
Try colSums(do.call(rbind, tb_vec$values)).jay.sf
Does this work for you? tb_vec %>% group_by(group) %>% tidyr::unnest(values) %>% summarize(vec_sum = colSums(purrr::reduce(values, rbind)))NelsonGon
I think you just need to use ... %>% summarize(vec_sum = list(colSums(purrr::reduce(values, rbind))))AntoniosK
What's the expected output? This provides the same output as @NelsonGon but mainly uses unlist(). tb_vec%>%group_by(group)%>%summarize(vec_sum = sum(unlist(values)))Cole
@AntoniosK. That's it, thanks! If you write it as an answer, I will accept it.MartijnVanAttekum

1 Answers

3
votes

You just need to add list(.) within summarise in your solution, in order to be able to have a column with 2 elements, where each element is a vector of 3 values:

library(tidyverse)

set.seed(1)
(tb_vec <- tibble(group = factor(rep(c("A","B"), c(2,3))),
                  values = replicate(5, sample(3), simplify = FALSE)))

tb_vec %>% 
  group_by(group) %>%                              
  summarize(vec_sum = list(colSums(purrr::reduce(values, rbind)))) -> res

res$vec_sum

# [[1]]
# [1] 2 4 6
# 
# [[2]]
# [1] 6 5 7