1
votes

I'm finding this problem hard to search about because the terms summarize, groupby, rearrange, table are just so generic.

What I'd like to do is summarize a value after grouping by exactly two factors, and put the result in a table with rows/columns as factor1/factor2. This is a special case of groupby-exactly-two with one value per cell, but I find myself wanting this pattern a lot.

Here's a toy data set where factor "isx" has two levels and factor "grp" has three levels:

library(dplyr)

df <- tibble(
  isx = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
          FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
  grp = c('a', 'b', 'c', 'a', 'b', 'c',
          'a', 'b', 'c', 'a', 'b', 'c'),
  val = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
)
df$isx <- factor(df$isx)
df$grp <- factor(df$grp)

df %>%
  group_by(isx, grp) %>%
  summarize(mean(val))

dplyr is great, I can easily find the summary of the value for each unique combination. But the output is a 6 row table with one row per 2x3=6 combinations of the factors.

## # A tibble: 6 x 3
## # Groups:   isx [2]
##   isx   grp   `mean(val)`
##   <fct> <fct>       <dbl>
## 1 FALSE a             8.5
## 2 FALSE b             9.5
## 3 FALSE c            10.5
## 4 TRUE  a             2.5
## 5 TRUE  b             3.5
## 6 TRUE  c             4.5

In the special case of exactly two factors, where I've just grouped by both of them, I'd love to transform that result into a more readable tiny table. Something like this:

#            grp 
#          a    b    c
#  isX  F  8.5  9.5  10.5
#       T  2.5  3.5   4.5

Last thought, I'm sure I can with effort look up bits of code to literally rebuild the output tibble into this matrix that I want. But this feels like a situation where R can probably do it with one call, if I could figure out how to phrase specifically what I'm looking for.

1

1 Answers

4
votes

We can use pivot_wider with values_fn as mean to do the reshape as well as aggregation in a single step

library(dplyr)
library(tidyr)
df %>% 
   pivot_wider(names_from = grp, values_from =  val, values_fn = mean)

-output

# A tibble: 2 x 4
#  isx       a     b     c
#   <fct> <dbl> <dbl> <dbl>
#1 TRUE    2.5   3.5   4.5
#2 FALSE   8.5   9.5  10.5

which is similar to dcast functionality (data.table/reshape2)

library(data.table)
dcast(df, isx ~ grp, value.var = 'val', mean)

Or using xtabs in base R with aggregate

xtabs(val ~ isx + grp, aggregate(val ~ isx + grp, df, mean))
#       grp
#isx        a    b    c
#  FALSE  8.5  9.5 10.5
#  TRUE   2.5  3.5  4.5

Or with tapply from base R

with(df, tapply(val, list(isx, grp), mean))
#       a   b    c
#FALSE 8.5 9.5 10.5
#TRUE  2.5 3.5  4.5