0
votes

I'd like to be able to use dplyr's group_by to group by multiple columns, simple enough. But, the complication is I want to create a function where one or more columns are always in the group by and the user can select an additional column to group by. What I've tried so far involves using the non-string specification of the columns that are always in the group by and using a string for the column the user selects, but nothing I've tried works. This combination seems to work fine in SELECT, but not GROUP_BY. Ideally, I'd rather not switch to all strings because I want to be able to take advantage of some of the functionality of dplyr that allows me to select a range of columns. Below is an example.

To make a simple example, I started with the iris data set and added a couple more columns, their exact meanings are not important.

test_tbl <- iris %>%
  mutate(extra_var1 = ifelse(Sepal.Length >= 5.0, "Yes", "No"),
         extra_var2 = "What")

Here's an example that uses the non-string specification for all variables, which works just fine:

test_tbl %>%
  select(Species, extra_var1, Sepal.Length, Petal.Width) %>%
  group_by(Species, extra_var1) %>%
  summarize(average.Sepal.Length = mean(Sepal.Length),
            average.Petal.Width = mean(Petal.Width))

But, I'd like to be able to, within a function, have the user specify whether they want to group by extra_var1 or extra_var2. Here's my attempt, which doesn't work. Again, I believe the select part works fine, but the group_by part does not.

group_and_summarize <- function(var) {
  test_tbl %>%
    select(Species, var, Sepal.Length, Petal.Width) %>%
    group_by(Species, var) %>%
    summarize(average.Sepal.Length = mean(Sepal.Length),
              average.Petal.Width = mean(Petal.Width))
}

group_and_summarize("extra_var1")
1

1 Answers

1
votes

This would be one way to do it:

library(dplyr)

group_and_summarize <- function(var) {
  test_tbl %>%
    select(Species, {{var}}, Sepal.Length, Petal.Width) %>%
    group_by(Species, {{var}}) %>%
    summarize(average.Sepal.Length = mean(Sepal.Length),
              average.Petal.Width = mean(Petal.Width))
}

group_and_summarize(extra_var1)
#> `summarise()` regrouping output by 'Species' (override with `.groups` argument)
#> # A tibble: 6 x 4
#> # Groups:   Species [3]
#>   Species    extra_var1 average.Sepal.Length average.Petal.Width
#>   <fct>      <chr>                     <dbl>               <dbl>
#> 1 setosa     No                         4.67               0.195
#> 2 setosa     Yes                        5.23               0.28 
#> 3 versicolor No                         4.9                1    
#> 4 versicolor Yes                        5.96               1.33 
#> 5 virginica  No                         4.9                1.7  
#> 6 virginica  Yes                        6.62               2.03

Created on 2021-05-11 by the reprex package (v0.3.0)

If you want the user to enter strings then we can use !!! syms():

group_and_summarize <- function(vars) {
  test_tbl %>%
    select(Species, !!! syms(vars), Sepal.Length, Petal.Width) %>%
    group_by(Species, !!! syms(vars)) %>%
    summarize(average.Sepal.Length = mean(Sepal.Length),
              average.Petal.Width = mean(Petal.Width))
}

group_and_summarize(c("extra_var1", "extra_var2"))

#> `summarise()` regrouping output by 'Species', 'extra_var1' (override with `.groups` argument)
#> # A tibble: 6 x 5
#> # Groups:   Species, extra_var1 [6]
#>   Species    extra_var1 extra_var2 average.Sepal.Length average.Petal.Width
#>   <fct>      <chr>      <chr>                     <dbl>               <dbl>
#> 1 setosa     No         What                       4.67               0.195
#> 2 setosa     Yes        What                       5.23               0.28 
#> 3 versicolor No         What                       4.9                1    
#> 4 versicolor Yes        What                       5.96               1.33 
#> 5 virginica  No         What                       4.9                1.7  
#> 6 virginica  Yes        What                       6.62               2.03

Created on 2021-05-11 by the reprex package (v0.3.0)