1
votes

Question

When using tibble %>% group_by() %>% summarise(...=reduce(...)) on a column containing tibbles, I would like the output to remain a column of tibbles. How do I do that most efficiently?

Minimal Example:

Setup

vec1 = rnorm(10)
vec2 = rnorm(10)
vec3 = rnorm(10)
vec4 = rnorm(10)

tib=tibble(grpvar=factor(c('a','a','b','b')))
tib$col2=1
tib$col2[1]=tibble(vec1)
tib$col2[2]=tibble(vec2)
tib$col2[3]=tibble(vec3)
tib$col2[4]=tibble(vec4)

This is what it looks like:

  grpvar col2      
  <fct>  <list>    
1 a      <dbl [10]>
2 a      <dbl [10]>
3 b      <dbl [10]>
4 b      <dbl [10]>

A very minimal tibble with a variable that will be used for grouping, and another column containing tibbles which contain vectors of length 10.

Problem

Using reduce within summarise simplifies the output...

tib %>% group_by(grpvar) %>% summarise(aggr=reduce(col2,`+`))

yields:

   grpvar    aggr
   <fct>    <dbl>
 1 a      -0.0206
...
10 a      -0.101 
...  
20 b       0.520 

Here, the tibble becomes very long ... I don't want 10 rows per group variable, but instead just one tibble containing the 10 values.

Desired output:

This is what it should look like

desired_outout<-tibble(grpvar=c('a','b'),aggr=NA)
desired_outout$aggr[1]=tibble(reduce(tib$col2[1:2],`+`))
desired_outout$aggr[2]=tibble(reduce(tib$col2[3:4],`+`))

which looks like:

# A tibble: 2 x 2
  grpvar aggr      
  <chr>  <list>    
1 a      <dbl [10]>
2 b      <dbl [10]

i.e., it retains the column-of-tibbles structure (which internally, I believe is a list of vectors)

1

1 Answers

2
votes

Wrap reduce with list:

tib %>% group_by(grpvar) %>% summarise(aggr=list(reduce(col2,`+`)))

Output:

# A tibble: 2 x 2
  grpvar aggr      
  <fct>  <list>    
1 a      <dbl [10]>
2 b      <dbl [10]>