I have a data.table for which I would like to perform some processing. As an initial step I'd like to set a new data.table
for columns.
I create a loop for columns interested and attempt to assign NA
/0 which fails or has issues as explained below.
library(data.table)
input_allele <- data.table(FID= paste0("gid",1:10),IID=paste0("IID",11:20),PAT=c(1:10),MAT=c(rep(0,10)),SEX=c(rep(1,10)),PHENOTYPE =c(rep(1,10)),
SNP1=(c(rep(1,5), rep(0,5))),SNP2=(c(rep(1,6),rep(0,3),NA)),SNP3=(c(rep(NA,6),rep(1,4))),SNP4=(c(rep(NA,6),rep(0,4))),SNP5=(c(rep(1,6),rep(0,4))) )
multiplied_value<-input_allele[,c(1:6)]
for(temp_snp in (colnames(input_allele[,.SD,.SDcols=c(7:11)]))){
temp_snpquote<-quote(temp_snp)
multiplied_value[,(temp_snpquote):=0]
}
I get an error:
Error in
[.data.table
(multiplied_value, ,:=
((temp_snpquote), 0)) : LHS of := must be a symbol, or an atomic vector (column names or positions).
If I use eval
, I run into a weird behavior: After completion of the loop I have to type multiplied_value
twice before the data.table is printed on the console. This is startling to me.
for(temp_snp in (colnames(input_allele[,.SD,.SDcols=c(7:11)]))){
temp_snpquote<-quote(temp_snp)
multiplied_value[,eval(temp_snpquote):=0]
}
I would like to understand: 1) how to set new column as NA or 0. 2) why using eval
has me type multiplied_value data.table twice it is printed.
R version 4.0.0 (2020-04-24), data.table_1.13.4
Unix debian distribution
set
for this rather than:=
. Something like:for (i in colnames(input_allele[,.SD,.SDcols=c(7:11)])) set(multiplied_value, j = i, value = 0); multiplied_value[]
. – A5C1D2H2I1M1N2O1R2T1for(temp_snp in (colnames(input_allele[,.SD,.SDcols=c(7:11)]))) multiplied_value[, (temp_snp):= 0]; multiplied_value[]
. – A5C1D2H2I1M1N2O1R2T1[]
where I have to type variable name twice.for(temp_snp in (colnames(input_allele[,.SD,.SDcols=c(7:11)]))) multiplied_value[, (temp_snp):= 0][]
I couldn't understand your first code snippet due to variables (j and i). How is what set there? – Death Metal?set
(where:=
is also demonstrated), towards the end you'll see timings for different ways of adding multiple columns to adata.table
. – A5C1D2H2I1M1N2O1R2T1[]
is to print after any in-place modification is used. – A5C1D2H2I1M1N2O1R2T1