1
votes

Consider the following simple dataset ds:

ds <- data.frame("x"=c(1,2,3), "y"=c(5,5,5))

I apply a function on some columns of ds like x and y and create two new variables named xnew and ynew. It works well:

ds[,c("xnew","ynew")] <- lapply(ds[,c("x","y")], function(x) x^2)

But suppose there ist some undefined column names like z! In this case I get the error "undefined columns selected" and nither xnew nor ynew were created. Is there any way to skip this error and create xnew and ynew and get only an error for znew? (something like trycatch by for-loops)

    ds[,c("xnew","ynew","znew")] <- lapply(ds[,c("x","y","z")], function(x) x^2)

    Error in `[.data.frame`(ds, , c("x", "y", "z")) : 
    undefined columns selected
1
Sounds like you're trying to solve an XY problem, more details on why you try to apply on unknown columns is needed to give proper advice - Tensibai
Try: lapply(ds[,colnames(ds) %in% c("x","y","z")], function(x) x^2) - GKi
@ Tensibai I am writing an R script and some variables are not gathered at the moment and comes later so I want to write the script including those variables too instead of modify the script later :-) - Fateta
@ GKi thanks for your Suggestion but this gives the value of xnew to znew! - Fateta
@Fateta No, it gives a list! Which you can bind to ds like: tt <- lapply(ds[,colnames(ds) %in% c("x","y","z")], function(x) x^2); ds[,paste0(names(tt), "_new")] <- tt - GKi

1 Answers

2
votes

You can define the lapply argument columns (oldvars) as the intersection between the column names of ds (x, y) and a vector that may include undefined column names (x, y, z). For the record, the data.table package incorporates an elegant internal lapply functionality which will be faster than base R for large datasets.

Code

ds = data.table(ds)

oldvars = intersect(c('x', 'y', 'z'), colnames(ds))
newvars = paste0(oldvars, '_new')

ds[, (newvars) := lapply(.SD, function(x) x^2), .SDcols = oldvars]

The last line applies the lapply statement onto a data.table subset (.SD), whereby the subset columns are declared using the .SDcols argument (in this case, x and y).

Using base R instead of data.table (from OPs comment):

ds[ ,newvars] <- lapply(ds[ ,oldvars], function(x) x^2)

Result:

> ds
   x y x_new y_new
1: 1 5     1    25
2: 2 5     4    25
3: 3 5     9    25