Question
When using tibble %>% group_by() %>% summarise(...=reduce(...))
on a column containing tibbles, I would like the output to remain a column of tibbles. How do I do that most efficiently?
Minimal Example:
Setup
vec1 = rnorm(10)
vec2 = rnorm(10)
vec3 = rnorm(10)
vec4 = rnorm(10)
tib=tibble(grpvar=factor(c('a','a','b','b')))
tib$col2=1
tib$col2[1]=tibble(vec1)
tib$col2[2]=tibble(vec2)
tib$col2[3]=tibble(vec3)
tib$col2[4]=tibble(vec4)
This is what it looks like:
grpvar col2
<fct> <list>
1 a <dbl [10]>
2 a <dbl [10]>
3 b <dbl [10]>
4 b <dbl [10]>
A very minimal tibble with a variable that will be used for grouping, and another column containing tibbles which contain vectors of length 10.
Problem
Using reduce
within summarise
simplifies the output...
tib %>% group_by(grpvar) %>% summarise(aggr=reduce(col2,`+`))
yields:
grpvar aggr
<fct> <dbl>
1 a -0.0206
...
10 a -0.101
...
20 b 0.520
Here, the tibble becomes very long ... I don't want 10 rows per group variable, but instead just one tibble containing the 10 values.
Desired output:
This is what it should look like
desired_outout<-tibble(grpvar=c('a','b'),aggr=NA)
desired_outout$aggr[1]=tibble(reduce(tib$col2[1:2],`+`))
desired_outout$aggr[2]=tibble(reduce(tib$col2[3:4],`+`))
which looks like:
# A tibble: 2 x 2
grpvar aggr
<chr> <list>
1 a <dbl [10]>
2 b <dbl [10]
i.e., it retains the column-of-tibbles structure (which internally, I believe is a list of vectors)