5
votes

Sorry for a very basic question, solution must be very simple but I'm not able to find it.

Trying to use gsub adding a new column in a data.table, I got the warning "argument 'replacement' has length > 1 and only the first element will be used", and all data.table rows have, in the new column, the value of the first row.

Here is a semplified case:

dt <- data.table(v1=c(1,2,3) , v2=c("axb","cxxd","exfxgx"))  
dt[ , v3:=gsub("x",v1,v2)]  

The new column v3 contains a string with "1" instead of "x" in all the rows.

Using other functions, e.g.

dt[ , v3:=paste(v1,v2)]  

works as expected.

I'm using Rstudio v.0.98.1103 , R v.3.1.2, data.table v.1.9.4

2
gsub is not vectorized in the replacement, so that's what the warning is telling you.A5C1D2H2I1M1N2O1R2T1
Another option would be myFunc <- function(x, y) gsub("x", x, y) ; dt[ , v3 := mapply(myFunc, v1, v2)] . Also, re my edit, when you are using the := operator, it is updating the data in place, so no need to reassign it again using dt <- dt. Take a look here for more information.David Arenburg
@DavidArenburg : thank you for your comment, I'm used to reassign data table even if the update is in place because I find very annoying, during execution, to see in the console all the heads/tails of updated data tables (it's more difficult to notice errors/warnings). Maybe there are more clever ways to avoid it...mbranco

2 Answers

13
votes
dt[, v3 := gsub("x", v1, v2), by = v1]  
5
votes

The easiest approach would be to use a string processing package that has vectorized arguments, like stringi:

library(stringi)
dt[, v3 := stri_replace_all_fixed(v2, "x", v1)][]
#    v1     v2     v3
# 1:  1    axb    a1b
# 2:  2   cxxd   c22d
# 3:  3 exfxgx e3f3g3

Alternatively, you can make your own "vectorized" version of gsub by using the Vectorize function:

vGsub <- Vectorize(gsub, vectorize.args = c("replacement", "x"))
dt[, v3 := vGsub("x", v1, v2)][]
#    v1     v2     v3
# 1:  1    axb    a1b
# 2:  2   cxxd   c22d
# 3:  3 exfxgx e3f3g3