1
votes

How to generate a new column in data.table based on multiple conditions?

If it is in data.frame environment, I can use below codes.

df<-data.frame(a=c(1,2,3,4,5,6,7,8,9,10),b=c(10,20,30,40,50,60,70,80,90,100))
df$c<-ifelse(df$b<=30,"G1",
             ifelse(df$b>30 & df$b<=60, "G2",
                    ifelse(df$b>60 & df$b<=80, "G3",
                           ifelse(df$b>80 & df$b<=90, "G4","G5"))))

In data.table environment, I know I can use

dt<-data.table(a=c(1,2,3,4,5,6,7,8,9,10),b=c(10,20,30,40,50,60,70,80,90,100))
dt[,d:=...]

to generate a new column. But how to generate df$c in dt using dt[,d:=...]?


How stupid am I? I didn't try. Below one works.

dt<-data.table(a=c(1,2,3,4,5,6,7,8,9,10),b=c(10,20,30,40,50,60,70,80,90,100))
dt[,d:=ifelse(b<=30,"G1",
              ifelse(b>30 & b<=60, "G2",
                     ifelse(b>60 & b<=80, "G3",
                            ifelse(b>80 & b<=90, "G4","G5"))))]

Thanks

1
try dt[, "c" := ...], feel free to self answer and accept - jangorecki
if you mean all those ifelse - did you try to remove just dt$ and put as RHS of :=? - jangorecki
Those multiple ifelse statements should be replaced with "cut" - Pierre L
@PierreLafortune I do not understand what you said. Could you please show me some code? Thanks. - kzhang12
@jangorecki I didn't use dt$ in data.table - kzhang12

1 Answers

5
votes

For cases involving recoding based on a range of values the base function cut helps tremendously:

dt[,d:= cut(b, c(-Inf, 30, 60, 80, 90, Inf), paste0("G", 1:5))]