data.table version of double for loop with vectors

Question

Sometimes I want to use a double for loop with an index to columns in a matrix, compute some value between them and assign to a cell in a matrix. A correlation table is an example of this. I was wondering if/how this can be done in data.table syntax. Here's the example as a for loop. How can I do the same thing in *data.table** even if it is slower this is more can it be done though faster would be nice. Note that we can't assume the value computer will give a symmetric matrix (i.e., y[i, j] != y[j, i] necessarily).

cos_sim <- function(x, y) x %*% y / sqrt(x%*%x * y%*%y)

x <- mtcars
y <- matrix(, nrow = ncol(x), ncol = ncol(x))

for (i in 1:ncol(x)) {
    for (j in 1:ncol(x)) {
        y[i, j] <- cos_sim(x[, i], x[, j])
    }
}

library(data.table)
x <- as.data.frame(x)
setDT(x)

akrun akrun · Accepted Answer · 2015-09-10T02:29:11

Another base R approach would be outer.

outer(x, x, FUN=Vectorize(cos_sim))
#          mpg       cyl      disp        hp      drat        wt      qsec
#mpg  1.0000000 0.8566168 0.7356738 0.7794276 0.9768897 0.8483280 0.9660715
#cyl  0.8566168 1.0000000 0.9656088 0.9689702 0.9241079 0.9828563 0.9414552
#disp 0.7356738 0.9656088 1.0000000 0.9576400 0.8266655 0.9659344 0.8599014
#hp   0.7794276 0.9689702 0.9576400 1.0000000 0.8717482 0.9492708 0.8750691
#drat 0.9768897 0.9241079 0.8266655 0.8717482 1.0000000 0.9183274 0.9859895
#wt   0.8483280 0.9828563 0.9659344 0.9492708 0.9183274 1.0000000 0.9484697
#qsec 0.9660715 0.9414552 0.8599014 0.8750691 0.9859895 0.9484697 1.0000000
#vs   0.7753943 0.4700802 0.3356976 0.3742408 0.7022767 0.5143092 0.7130090
#am   0.7421732 0.5030698 0.3505303 0.5007184 0.7101727 0.4575882 0.6169362
#gear 0.9672733 0.9177938 0.8172070 0.8812034 0.9903890 0.9076279 0.9723964
#carb 0.7581483 0.9082799 0.8604485 0.9450793 0.8549106 0.8943285 0.8346877
#            vs        am      gear      carb
#mpg  0.7753943 0.7421732 0.9672733 0.7581483
#cyl  0.4700802 0.5030698 0.9177938 0.9082799
#disp 0.3356976 0.3505303 0.8172070 0.8604485
#hp   0.3742408 0.5007184 0.8812034 0.9450793
#drat 0.7022767 0.7101727 0.9903890 0.8549106
#wt   0.5143092 0.4575882 0.9076279 0.8943285
#qsec 0.7130090 0.6169362 0.9723964 0.8346877
#vs   1.0000000 0.5188745 0.6788292 0.3655971
#am   0.5188745 1.0000000 0.7435907 0.5766850
#gear 0.6788292 0.7435907 1.0000000 0.8802046
#carb 0.3655971 0.5766850 0.8802046 1.0000000

It can be also made into data.table syntax, but the output is a matrix, so I wouldn't say that there would be any improvement in efficiency.

setDT(x)[,outer(.SD, .SD, FUN=Vectorize(cos_sim))]

data.table version of double for loop with vectors

3 Answers