Sum column values in a table based on row values of a different table

Question

I want to sum column values in a table DT2 based on row values of a different table DT1 and store it in DT3

library(data.table)
DT1 <- data.table(x = seq(1,100, by = 1), 
                 y = round(runif(100,50, 60)),
                 z = round(runif(100, 61, 70)))

DT2 <- data.table(a = seq(1, 1000, by = 1),
                 b = round(rnorm(1000, 10,1)),
                 c = round(rnorm(1000, 20,1)),
                 d = round(rnorm(1000, 30,1)))

Initialize DT3

DT3 <- DT1[,.(x)]
DT3[,':='(e = 0,
          f = 0,
          g = 0)]

Using for loop for summing values

for(i in 1:nrow(DT1)){
  DT3[i, 2:4] <- DT2[DT1[i,y]:DT1[i,z], 
                     lapply(.SD, sum),
                     .SDcols = c("b", "c", "d")]
}

This works fine. The only issue is it takes lot of time as the number of rows and columns increases in DT1 and DT2. I have DT1 with more than 100000 rows. This takes me around 30 mins. Is there a faster way to do this operation?

pogibas pogibas · Accepted Answer · 2017-12-27T10:00:59

You can try this:

library(data.table)
DT1 <- data.table(x = seq(1,100, by = 1), 
                 y = round(runif(100,50, 60)),
                 z = round(runif(100, 61, 70)))

DT2 <- data.table(a = seq(1, 1000, by = 1),
                 b = round(rnorm(1000, 10,1)),
                 c = round(rnorm(1000, 20,1)),
                 d = round(rnorm(1000, 30,1)))

matrix(DT1[, colSums(DT2[, -"a"][as.numeric(y:z)]), 1:nrow(DT1)]$V1,
       ncol = 3, byrow = TRUE)

  #      [,1] [,2] [,3]
  # [1,]  114  223  324
  # [2,]   73  141  208
  # [3,]  112  222  323
  # [4,]  151  301  443
  # [5,]  133  263  382
  # ...

Iterate over DT1 rows and extract vector y:z (as.numeric(y:z)). Pass this vector to DT2 and get colSums.

Sum column values in a table based on row values of a different table

2 Answers