2
votes

I would like to apply an ifelse function looping through 2 vectors and then based on the condition update the columns of a data table. I'm looking for a solution which can work on large number of columns.

I'm demonstrating the problem through a toy dataset mtcars.

library(data.table)
mtcars <- data.table(mtcars)

Now, I would like to limit the values for some of the columns and replace the values of the corresponding columns with the defined limits. But the below code is giving me weird results.

limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- sapply(limitlist, "[[", 2)
for (i in length(limitlist)) mtcars[, c(cols) := lapply(.SD, function(x){ifelse(x[i] > lims[i], lims[i], x[i])}), .SDcols = cols]

My desired output:

    range(mtcars$hp)
[1]  52 300


    range(mtcars$disp)
[1]  71.1 450.0

I'm new to data.table syntax so might be a dumb error. Any help on this is highly appreciated.

3
Why not just do mydata$arr_time <- pmin(2000, mydata$arr_time)? - Andrew Gustar
@ Uwe - I have edited the reproducible example. Unfortunately, c(col) := doesn't work either. - Debbie
@ Andrew - I'm looking for a general solution which can work on a large number of columns. - Debbie
setDT(mtcars) throws an error: Can not convert 'mtcars' to data.table by reference because binding is locked. It is very likely that 'mtcars' resides within a package (or an environment) that is locked to prevent modifying its variable bindings. Try copying the object to your current environment, ex: var <- copy(var) and then using setDT again. - Uwe
@ Uwe - Thanks for pointing this out. I have edited the question. - Debbie

3 Answers

2
votes

That is pretty advanced stuff for a newbie to data.table. However, here are three other variants:

  1. subsetting and updating
  2. update join
  3. using set()

These approaches update only the affected elements in the respective column vectors while the other solutions posted so far (Frank's, Dan Y's) replace the whole column. There might be a performance gain if only a few elements need to be replaced.

Note that we are using limitlist as provided by the OP.

1. subsetting and updating

# subsetting and updating
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
  DT[get(cols[i]) > lims[i], (cols[i]) := lims[i]]
# check ressults
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
      hp disp
[1,] 335  472
[2,] 300  450

2. update join

# update join
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
  DT[.(lims[i]), on = sprintf("%s>%s", cols[i], "V1"), (cols[i]) := lims[i]]
# check results
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
      hp disp
[1,] 335  472
[2,] 300  450

3. using set()

# using `set()`
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
  set(DT, which(DT[[cols[i]]] > lims[i]), cols[i], lims[i])
# check results
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
      hp disp
[1,] 335  472
[2,] 300  450

IMHO, this variant is the most straightforward approach.

2
votes

Since Dan's answer doesn't use data.table syntax...

library(data.table)

# input
mylist = list(hp = 300, disp = 450)
DT = data.table(mtcars)

# update
DT[, names(mylist) := Map(pmin, .SD, mylist), .SDcols=names(mylist)]
0
votes

This should do it:

first, get your limlist to be numeric instead of character:

lims <- as.numeric(sapply(limitlist, "[[", 2))

then you can loop:

for (i in 1:length(limitlist)) {
    mtcars[[cols[i]]] <- pmin(mtcars[[cols[i]]], lims[i])
}