1
votes

I want to create several variables using a formula with R data.table. I have a list of variables, and for each one I want to perform a calculation and create a new variable, pasting the same string onto each column name. I can get it to work for one variable at a time, but it doesn't work for a lapply or a loop. I suspect I am missing something with R data.table and quotation marks or variable names vs. strings. Do I need to use ".." or wrap with eval()? A dplyr (or any tidyverse) solution would solve the issue too.

Here is example code with mtcars:

library(data.table)
mtcars.dt <- setDT(mtcars)
myVars <- c("mpg", "hp", "qsec")

# Doesn't work:
for( myVar in myVars){
  mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp]
}

# Doesn't work:
lapply(myVars, function(myVar) mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp])

# Works:
mtcars.dt[, mpg.disp.ratio := mpg / disp]

# Doesn't work
for (myVar in myVars){
  mtcars.dt[, paste0(myVar, ".disp.lm.adj") := 
              myVar - 
              lm(data = .SD, formula = myVar ~ disp)$coefficients[2] * (disp - mean(disp))]
}

# Doesn't work
lapply(myVars, function(x) mtcars.dt[, paste0(x, ".disp.lm.adj") := 
                                       x - 
                                       lm(data = .SD, formula = x ~ disp)$coefficients[2] * (disp - mean(disp))])

# Works
mtcars.dt[, mpg.disp.lm.adj := 
            mpg - 
            lm(data = .SD, formula = mpg ~ disp)$coefficients[2] * (disp - mean(disp))]

For the ratio calculation, I get the following error:

Error in myVar/disp : non-numeric argument to binary operator 

For the lm adjustment, I get the following error:

Error in model.frame.default(formula = myVar ~ disp, data = .SD, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'disp')
1

1 Answers

1
votes

We can use get

library(data.table)
for( myVar in myVars){
   mtcars.dt[, paste0(myVar, ".disp.ratio") := get(myVar) / disp]
  }

Or wrap with eval after converting to symbol

for( myVar in myVars){
   mtcars.dt[, paste0(myVar, ".disp.ratio") := eval(as.name(myVar)) / disp]
  }

Or another option is to specify in .SDcols, loop over the .SD (Subset of Data.table, do the transformation and create the new variables by assignment (:=)

mtcars.dt[, paste0(myVars, ".disp.ratio") := lapply(.SD, `/`, disp), 
             .SDcols = myVars]

For the second case, we can create the formula with paste

for (myVar in myVars) {
  mtcars.dt[, paste0(myVar, ".disp.lm.adj") := 
              get(myVar) - 
              lm(data = .SD, formula = paste(myVar,  "~ disp"))$coefficients[2] *
               (disp - mean(disp))]
}