Dist function between matrix objects in R

Question

I have a very simple problem.

Given a N dimension point (say, a vector where each element represents a dimension) represented by x and a MxN dimension matrix (or a group of M points that have N dimensions!) represented by y.

set.seed(999)
data <- matrix(runif(1100), nrow = 11, ncol = 10)

x <- data[1, ]
y <- data[2:nrow(data), ]

I want to calculate a distance measure between x and every point of y. I know a simple way of doing that is to do:

distances <- dist(rbind(x, y))

However, I believe this is not very efficient for this specific case, for the following reasons:

I need to use rbind, which is very memory costly.
dist calculates the distance between every point, but I'm only interested in 10 of those distances, or simply the distance between x and every point of y. I'm not interested in the internal distances between y points.
Because of (2) I need to manually select the last line of the dist matrix to get the distances I actually need.

One possible solution I thought of was to apply the distance measurement manually looping through y.

distances <- apply(y, MARGIN = 1, function(a, b = x) {
   sqrt(sum((a - b)^2))
})

However, when timing both approaches, I get:

func1 <- function(x, y) {
  apply(y, MARGIN = 1, function(a, b = x) {
    sqrt(sum((a - b)^2))
  })
}

func2 <- function(x, y) {
  dist(rbind(x, y))
}

microbenchmark::microbenchmark(
  func1(x, y),
  func2(x, y)
)

Unit: microseconds
        expr    min     lq     mean median      uq      max neval
 func1(x, y) 29.602 30.450 61.21791 31.301 32.3510 2916.101   100
 func2(x, y) 15.101 15.801 28.55304 17.201 17.7015 1143.001   100

So my question here is: is there a way to solve this problem faster than using dist?

akrun akrun · Accepted Answer · 2021-04-04T17:18:54

One option is dapply from collapse

 library(collapse)
 func3 <- function(x, y) {
     dapply(y, function(a, b = x) {
             sqrt(sum((a-b)^2))
          }, MARGIN = 1)
  }

Or may use vapply

func4 <- function(x, y) {
  vapply(seq_len(nrow(y)), function(i, b = x) sqrt(sum((y[i,]-b)^2)), numeric(1))
 }

Or may replicate the vector and use rowSums after subtracting

func7 <- function(x, y) sqrt(rowSums((y-x[col(y)])^2))
microbenchmark::microbenchmark(func1(x, y), func3(x, y), func4(x, y), func7(x, y))
#Unit: microseconds
#        expr    min      lq     mean  median      uq      max neval cld
# func1(x, y) 37.605 39.7475 61.17471 40.7595 42.1865 1955.888   100   a
# func3(x, y) 22.212 23.5945 68.63660 24.8320 25.8670 4333.933   100   a
# func4(x, y) 21.089 22.7930 24.11542 23.5945 24.2315   58.050   100   a
# func7(x, y)  7.731  8.9135 44.45935 10.0615 10.9500 3415.959   100   a

Dist function between matrix objects in R

3 Answers