I'd like to be able to use dplyr's group_by to group by multiple columns, simple enough. But, the complication is I want to create a function where one or more columns are always in the group by and the user can select an additional column to group by. What I've tried so far involves using the non-string specification of the columns that are always in the group by and using a string for the column the user selects, but nothing I've tried works. This combination seems to work fine in SELECT, but not GROUP_BY. Ideally, I'd rather not switch to all strings because I want to be able to take advantage of some of the functionality of dplyr that allows me to select a range of columns. Below is an example.
To make a simple example, I started with the iris data set and added a couple more columns, their exact meanings are not important.
test_tbl <- iris %>%
mutate(extra_var1 = ifelse(Sepal.Length >= 5.0, "Yes", "No"),
extra_var2 = "What")
Here's an example that uses the non-string specification for all variables, which works just fine:
test_tbl %>%
select(Species, extra_var1, Sepal.Length, Petal.Width) %>%
group_by(Species, extra_var1) %>%
summarize(average.Sepal.Length = mean(Sepal.Length),
average.Petal.Width = mean(Petal.Width))
But, I'd like to be able to, within a function, have the user specify whether they want to group by extra_var1 or extra_var2. Here's my attempt, which doesn't work. Again, I believe the select part works fine, but the group_by part does not.
group_and_summarize <- function(var) {
test_tbl %>%
select(Species, var, Sepal.Length, Petal.Width) %>%
group_by(Species, var) %>%
summarize(average.Sepal.Length = mean(Sepal.Length),
average.Petal.Width = mean(Petal.Width))
}
group_and_summarize("extra_var1")