
Sometimes I want to use a double for loop with an index to columns in a matrix, compute some value between them and assign to a cell in a matrix. A correlation table is an example of this. I was wondering if/how this can be done in data.table syntax. Here's the example as a for loop. How can I do the same thing in *data.table** even if it is slower this is more can it be done though faster would be nice. Note that we can't assume the value computer will give a symmetric matrix (i.e., y[i, j] != y[j, i] necessarily).

cos_sim <- function(x, y) x %*% y / sqrt(x%*%x * y%*%y)

x <- mtcars
y <- matrix(, nrow = ncol(x), ncol = ncol(x))

for (i in 1:ncol(x)) {
    for (j in 1:ncol(x)) {
        y[i, j] <- cos_sim(x[, i], x[, j])

x <- as.data.frame(x)

Another base R approach would be outer.

outer(x, x, FUN=Vectorize(cos_sim))
#          mpg       cyl      disp        hp      drat        wt      qsec
#mpg  1.0000000 0.8566168 0.7356738 0.7794276 0.9768897 0.8483280 0.9660715
#cyl  0.8566168 1.0000000 0.9656088 0.9689702 0.9241079 0.9828563 0.9414552
#disp 0.7356738 0.9656088 1.0000000 0.9576400 0.8266655 0.9659344 0.8599014
#hp   0.7794276 0.9689702 0.9576400 1.0000000 0.8717482 0.9492708 0.8750691
#drat 0.9768897 0.9241079 0.8266655 0.8717482 1.0000000 0.9183274 0.9859895
#wt   0.8483280 0.9828563 0.9659344 0.9492708 0.9183274 1.0000000 0.9484697
#qsec 0.9660715 0.9414552 0.8599014 0.8750691 0.9859895 0.9484697 1.0000000
#vs   0.7753943 0.4700802 0.3356976 0.3742408 0.7022767 0.5143092 0.7130090
#am   0.7421732 0.5030698 0.3505303 0.5007184 0.7101727 0.4575882 0.6169362
#gear 0.9672733 0.9177938 0.8172070 0.8812034 0.9903890 0.9076279 0.9723964
#carb 0.7581483 0.9082799 0.8604485 0.9450793 0.8549106 0.8943285 0.8346877
#            vs        am      gear      carb
#mpg  0.7753943 0.7421732 0.9672733 0.7581483
#cyl  0.4700802 0.5030698 0.9177938 0.9082799
#disp 0.3356976 0.3505303 0.8172070 0.8604485
#hp   0.3742408 0.5007184 0.8812034 0.9450793
#drat 0.7022767 0.7101727 0.9903890 0.8549106
#wt   0.5143092 0.4575882 0.9076279 0.8943285
#qsec 0.7130090 0.6169362 0.9723964 0.8346877
#vs   1.0000000 0.5188745 0.6788292 0.3655971
#am   0.5188745 1.0000000 0.7435907 0.5766850
#gear 0.6788292 0.7435907 1.0000000 0.8802046
#carb 0.3655971 0.5766850 0.8802046 1.0000000

It can be also made into data.table syntax, but the output is a matrix, so I wouldn't say that there would be any improvement in efficiency.

setDT(x)[,outer(.SD, .SD, FUN=Vectorize(cos_sim))]

Here's one way:

x <- mtcars

x[, lapply(.SD, function(xx) {
  lapply(x, function(yy) cos_sim(xx, yy))

The biggest difference between this and your original is really the use of apply in place of the for loops. It's data.table-ish in that it makes use of .SD, but one can also just do the following in base R:

sapply(x, function(xx) {
  sapply(x, function(yy) cos_sim(xx, yy))

I think it's more svelte and preferable vs. nested for loops, but not sure it's really taking special advantage of data.table per se


Matrix algebra As far as efficiency goes, yeah, matrix operations are your best bet:

mx  <- as.matrix(x)
sx  <- 1 / sqrt( colSums(mx^2) )
res <- (t(mx) %*% mx) * (sx %*% t(sx))

This also gives you nice row and col labels, unlike the OP's for loop.

data.table This isn't really natural here, but...

meltx <- melt(x[,id:=.I], id.var="id"); x[,id:=NULL]
cartx <- meltx[meltx, on="id", allow.cartesian=TRUE]
res2  <- dcast(cartx[, cos_sim(value, i.value), by=.(v1=variable,v2=i.variable)], v1~v2)

You get a data.table back out in this case, if that's a plus.