I am using a data.table to store data which is in a string format. The strings hold information that I want to retrieve using a function. This function, in my real script, does multiple calculations and parsing, and at the end returns another data.table with many columns and many rows. This function receives a whole row of my original data.table as argument (all variables are used): myFun(dt[rowNumber, ]
While some columns of my original data.table will still be used later in my script, one of the variables in the data.table is expendable after processing, so I want to replace this variable with the data.table I get from my function. This allows me to keep some link between my remaining variables and this new data.table, so I can later pass all together to other functions.
However, since I am working with many rows, I want to speed things up using data.table::set function to update my cell, but R won't allow me to use:
data.table::set(dt, i=rowNum, j=colNum, value = list(list(myFun(dt[rowNum, ])))
If firstly, I don't do:
dt$someVar[1L] <- list(myFun(dt[1L, ]))
This is the following warning that I get using only set
In data.table::set(dt, i = rowNum, j = colNum, value = list(list(myFun(dt[rowNum, : Coerced 'list' RHS to 'character' to match the column's type. Either change the target column to 'list' first (by creating a new 'list' vector length 3 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'character' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
I receive the same warning when using solely:
dt[rowNum, ((names(dt))[colNum]) := list(list(myFun(dt[rowNum, ])))]
Here is an clear illustrative example (not real problem) of this issue I am facing:
col1 <- as.character(1:3)
col2 <- as.character(4:6)
col3 <- as.character(7:9)
dt <- data.table::data.table(var1 = col1, var2 = col2, var3 = col3)
myFun <- function(rowDt)
{
v1 <- as.numeric(rowDt$var1[1])
v2 <- as.numeric(rowDt$var2[1])
v3 <- as.numeric(rowDt$var3[1])
col1 <- c(v1*v2, v1*v3)
col2 <- c(v2*v2, v2*v3)
return(data.table::data.table(var1 = col1, var2 = col2))
}
colNum = 3L
for (rowNum in 1L:nrow(dt))
{
data.table::set(dt, i=rowNum, j=colNum, value = list(list(myFun(dt[rowNum, ]))))
}
The above code yields the previous warning message, howwever, this works:
colNum = 3L
dt$var3[1L] <- list(myFun(dt[1L, ]))
for (rowNum in 2L:nrow(dt))
{
data.table::set(dt, i=rowNum, j=colNum, value = list(list(myFun(dt[rowNum, ]))))
}
Is this an expected behavior? If it is, why does it happen and how could I take advantage of data.table::set higher performance by only using it?
dt[ , (names(dt)) := lapply(.SD, as.numeric)]; dt[ , c('new_var_1', 'new_var_2') := .(var1 + var2, var1 + var3)]
? – MichaelChiricoItem 1 of column numbers in j is 3 which is outside range [1,ncol=2]. Use column names instead in j to add new columns.
The FAQ covers why it's designed to prefer column names over numbers. Ah, never mind, I see you do create the third column withdt$var3[1L] <- list(myFun(dt[1L, ]))
which is very unidiomatic. You might want to take a detour to become familiar with the typical package syntax (which looks more like what Michael wrote). – Frank