combine tidyr::spread and dplyr::summarise in single step

Question

I often want to perform tidyr::spread and dplyr::summarise in a "single step" to aggregate data by group. What I want is shown in expected. I can get expected by performing summarise and spread separately and combine the results with a dplyr::full_join but I'm looking for alternative approaches that avoid full_join. Bona fide single-step approaches are not necessary.

df <- data.frame(
        id = rep(letters[1], 2),
        val1 = c(10, 20),
        val2 = c(100, 200),
        key = c("A", "B"),
        value = c(1, 2))

library(tidyverse)
result1 <- df %>%
              group_by(id) %>%
              summarise(
                val1 = min(val1),
                val2 = max(val2)
              )
# A tibble: 1 x 3
  # id      val1  val2
  # <fctr> <dbl> <dbl>
# 1 a       10.0   200

result2 <- df %>%
              select(id, key, value) %>%
              group_by(id) %>%
              spread(key, value)
# A tibble: 1 x 3
# Groups: id [1]
  # id         A     B
# * <fctr> <dbl> <dbl>
# 1 a       1.00  2.00

expected <- full_join(result1, result2, by="id")
# A tibble: 1 x 5
  # id      val1  val2     A     B
  # <fctr> <dbl> <dbl> <dbl> <dbl>
# 1 a       10.0   200  1.00  2.00

Calum You Calum You · Accepted Answer · 2018-02-16T23:10:23

I suspect your data may have more edge cases that require some modification, but why don't you simply spread then summarise? You can specify the summary function separately per variable, so for A and B where you don't actually need to calculate anything (I'm assuming) you can just remove all the NA:

df %>%
  spread("key", "value") %>%
  group_by(id) %>%
  summarise(
    val1 = min(val1),
    val2 = max(val2),
    A = mean(A, na.rm = TRUE),
    B = mean(B, na.rm = TRUE)
    )
# A tibble: 1 x 5
  id     val1  val2     A     B
  <fct> <dbl> <dbl> <dbl> <dbl>
1 a      10.0   200  1.00  2.00

combine tidyr::spread and dplyr::summarise in single step

3 Answers