I'm trying to summarize multiple columns using summarize_at()
with a custom function. The part I'm stuck on is the function ssmd()
is meant to take a vector of values from the group established by group_by()
and another vector of values from outside this group.
In the example below, x
should be a vector for each set of values by Month
(varies according to the current group), and y
should be a fixed set of values for Month == 5
.
# custom function
ssmd <- function(x, y){
(mean(x, na.rm = TRUE) - mean(y, na.rm = TRUE)) / sqrt(var(x, na.rm = TRUE) + var(y, na.rm = TRUE))
}
# dataset
d <- airquality
# this isn't working - trying to find the difference between the mean for each Month and the mean of Month 5, for columns Ozone, Solar.R, Wind, and Temp
d %>%
group_by(Month) %>%
summarize_at(vars(Ozone:Temp), funs(ssmd, x = ., y = .[Month == 5])) %>%
ungroup()
At the moment, this gives the following error: Error in mean(y, na.rm = TRUE) : argument "y" is missing, with no default
. So I think I have a syntax error, in addition to being stuck on how to access values from outside the current group.
The expected output is a data frame with one row for each Month and one column for each variable (Ozone, Solar.R, Wind, and Temp).