3
votes

I went through a lot of questions that are similar to mine but only addressed one part of my problem. I am using dplyr with standard evaluation to accommodate variable names. This works fine for filter_ and group_by_ in a pipe. However, for summarize I cannot have a variable name for the metric I'm summing. An example will make it clear.

library(dplyr)
library(lazyeval)

# create data
a <- data.frame(
  x = c(2010, 2010, 2011, 2011, 2011),
  y_zm = c(rep(10, 5)),
  y_r2 = c(rep(20, 5)))

# define variable names
tag <- "2011"
metric <- "y"
run1 <- "zm"
run2 <- "r2"

# working example for a pipe with fixed variable name
a %>%
  filter_(~x == tag) %>%
  group_by_(tag) %>%
  summarise_(variable_name = interp(~sum(var, na.rm = T), 
                                    var = as.name(paste0(metric,"_",run1))))

# non-working example of what I want to do
a %>%
  filter_(~x == tag) %>%
  group_by_(tag) %>%
  summarise_(as.name(paste0(metric,"_",run1)) = 
               interp(~sum(var, na.rm = T), 
                      var = as.name(paste0(metric,"_",run1))))

I tried a lot of different things involving as.name() or interp() but nothing seems to work.

1
Maybe just add a rename_ step a la this answer? It could look something like rename_(.dots = setNames("variable_name", paste0(metric,"_",run1))). - aosmith

1 Answers

4
votes

After poring over the NSE vignette for awhile and poking at things, I found you can use setNames within summarise_ if you use the .dots argument and put the interp work in a list.

a %>%
    filter_(~x == tag) %>%
    group_by_(tag) %>%
    summarise_(.dots = setNames(list(interp(~sum(var, na.rm = TRUE),
                                            var = as.name(paste0(metric,"_",run1)))), 
                                                            paste0(metric,"_",run1)))

Source: local data frame [1 x 2]

  2011 y_zm
1 2011   30

You could also add a rename_ step to do the same thing. I could see this being less ideal, as it relies on knowing the name you used in summarise_. But if you always use the same name, like variable_name, this does seem like a viable alternative for some situations.

a %>%
    filter_(~x == tag) %>%
    group_by_(tag) %>%
    summarise_(variable_name = interp(~sum(var, na.rm = T), 
                                         var = as.name(paste0(metric,"_",run1)))) %>%
    rename_(.dots = setNames("variable_name", paste0(metric,"_",run1)))

Source: local data frame [1 x 2]

  2011 y_zm
1 2011   30