6
votes

How would I go about using mutate (my presumption is that I am looking for standard evaluation in my case, and hence mutate_, but I am not entirely confident on this point) when using a function that accepts a list of variable names, such as this:

createSum = function(data, variableNames) {
  data %>% 
    mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                            var = as.name(paste(as.character(variableNames), collapse =","))))

}

Here is an MWE that strips the function to its core logic and demonstrates what I am trying to achieve:

library(dplyr)
library(lazyeval)

# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
  liSample = lapply(colNames, function(week) {
    sample = rnorm(sampleSize)
  })
  names(liSample) = as.character(colNames)
  return(tbl_df(data.frame(liSample, check.names = FALSE)))
}

# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
                     to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)

# test mutate on this table
dfTest %>% 
  mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                          var = as.name(paste(as.character(weekDates), collapse =","))))

Expected output here is what would be returned by:

rowSums(dfTest[, as.character(weekDates)])
2
You define makeTable but then call makeDataFrame. Are these supposed to be the same function? It would be helpful to describe the output you expect for this sample input (set a seed to the data is reproducible).MrFlick
@MrFlick Thanks. Changed the function name. Nothing fancy is expected, just the sum of all the variables whose variable names are passes to the function, by row. Will update with expected output.tchakravarty

2 Answers

5
votes

I think this is what you're after

createSum = function(data, variableNames) {
    data %>% 
        mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)

where we just supply a character value rather than interp because you can't pass in a list of names as a single parameter to a function. Plus, sum() would do some undesired collapsing because operations are not performed rowwise, they are passed in columns of vectors at a time.

The other problem with this example is that you set check.names=FALSE in your data.frame which means that you've created column names that cannot be valid symbols. You can explicitly wrap your variable names in back-ticks if you like

createSum(dfTest , paste0("`", weekDates,"`"))

but in general it would be better not to use invalid names.

1
votes

I don't know if this is an "officially sanctioned" dplyr way, but this is a possibility:

weekDates = as.character(weekDates) # more convenient

dfTest %>% mutate(sumvar = Reduce(`+`, lapply(weekDates, get, .)))
#or
dfTest %>% mutate(sumvar = rowSums(as.data.frame(lapply(weekDates, get, .))))

This does carry potentially significant performance penalties, depending on your particular usage - in addition to dplyr's regular copying of the entire data I think it also copies it a second time during that internal computation. You can look into data.table to avoid the extra copying around by adding columns in place (and using .SDcols to avoid the second copy) + you'll get arguably better syntax.