I have two data.tables:
k1 <- mtcars[1:4,1:6]
k11 <- as.data.table(k1)
k2 <- iris[1:3,1:2]
k22 <- as.data.table(k2)
I am trying to perform some column operation on the first data.table by iterating over rows of columns of second data.table
k3 <- lapply(1:nrow(k2),function(j){
mpg=k1[,"mpg"]*k2[j,"Sepal.Width"] #get the new value of mpg equals to mpg*first row of second column of second data.frame
cyl=k1[,"cyl"]*k2[j,"Sepal.Width"]
a3=k1[,3:6] #all remaining columns over which no operations are done
a4<-cbind(mpg,cyl,a3) #cbind these and create a new dataframe for each row of second dataframe. There are three rows and hence there will be three final dataset
})
#rbind all these dataset and get the new dataset
k4<-do.call(rbind,k3)
head(k4)
mpg cyl disp hp drat wt
Mazda RX4 73.50 30.6 160 110 3.90 2.620
Mazda RX4 Wag 73.50 30.6 160 110 3.90 2.875
Datsun 710 79.80 20.4 108 93 3.85 2.320
Hornet 4 Drive 74.90 30.6 258 110 3.08 3.215
While, the above solution works perfect, I was wondering:
- whether there is an efficiency gain using data.table (since there is no
group_by
operation here) with the first data frame of 30000 times 10 and the #second data frame of 60 times 4 (final dataset will have 30000 times 60 : 1.8 million rows). - if there is an efficiency gain, how would one obtain the efficiency using
data.table
:
Following is my solution (similar to data.frame)
k5<-rbindlist(lapply(1:nrow(k2),function(j){
k11[,`:=`(mpg=mpg*k22[j,Sepal.Width],cyl=cyl*k22[j,Sepal.Length])]
}))
head(k5)
mpg cyl disp hp drat wt
1: 705.60 704.718 160 110 3.90 2.620
2: 705.60 704.718 160 110 3.90 2.875
3: 766.08 469.812 108 93 3.85 2.320
4: 719.04 704.718 258 110 3.08 3.215
5: 705.60 704.718 160 110 3.90 2.620
6: 705.60 704.718 160 110 3.90 2.875
As, you can see answers are different(I guess because of nature of copying of data.table).