I have this situation where I need a different summary function based on a condition. For example, using iris, say for some reason I wanted the sum of the petal width if the species was setosa, otherwise I wanted the mean of the petal width.
Naively, I wrote this using case_when, which does not work:
iris <- tibble::as_tibble(iris)
iris %>%
group_by(Species) %>%
summarise(pwz = case_when(
Species == "setosa" ~ sum(Petal.Width, na.rm = TRUE),
TRUE ~ mean(Petal.Width, na.rm = TRUE)))
Error in summarise_impl(.data, dots) :
Column pwz
must be length 1 (a summary value), not 50
I eventually found something like this, summarizing using each method, and then in a mutate picking which one I actually wanted:
iris %>%
group_by(Species) %>%
summarise(pws = sum(Petal.Width, na.rm = TRUE),
pwm = mean(Petal.Width, na.rm = TRUE)) %>%
mutate(pwz = case_when(
Species == "setosa" ~ pws,
TRUE ~ pwm)) %>%
select(-pws, -pwm)
But that seems more than a bit awkward with creating all these summarized values and only picking one at the end, especially when my real case_when is a lot more complicated. Can I not use case_when inside of summarise? Do I have my syntax wrong? Any help is appreciated!
Edit: I suppose I should have pointed out that I have multiple conditions/functions (just assume I've got, depending on the variable, some that need mean, sum, max, min, or other summary).
case_when
like that because it's a vectorized function so it does not collapse (you'd only be taking the sum/mean of one value at a time). – MrFlick