Calculate euclidean distance between multiple vectors in R

Question

I'm trying to calculate the euclidean distances between one vector on the one hand and multiple vectors on the other hand using R.

So far, I've been following this documentation https://cran.r-project.org/web/packages/neighbr/neighbr.pdf and used distance(x, y, "euclidean"). This works perfectly well if I only calculate the distance between two vectors, i.e. when I have one row of data in both x and y. However, in my original dataset, I have multiple rows in y and I'd like to calculate the distances between each of these rows and the single row in x.

How is it possible to do this?

x = structure(list(`Feature I` = 0.85649790378586, `Feature II` = 0.851856356221207, `Feature III` = 0.799580263077569, `Feature IV` = 0.895081402129565, `Feature V` = 0.920173237422567), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

y = structure(list(`Feature I` = c(0.0444280626160322, 0.00326398594129033, 0.0218000692329814), `Feature II` = c(0.0481646509894741, 0.00509786237104908, 0.0276902769176258), `Feature III` = c(0.0456380620204004, 0.00422956673025977, 0.0347273727088683), `Feature IV` = c(0.0365954415011219, 0.00422974884164406, 0.0328151120410415), `Feature V` = c(0.0384331094111439, 0.00362614754925969, 0.0260414956219995)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

Something like this - which generalizes whether you have one or more rows in x. — Gregor Thomas
If you need more help, please make your question reproducible by sharing a few rows of data in a copy/pasteable way (either built-in data, code to simulate data, or dput() for a copy/pasteable version of data you have, e.g., dput(your_data[1:4, ])) — Gregor Thomas
Though with only a single row in x, writing the formula manually should be easy, y$dist_from_x = sqrt((y$x - x$x)^2 + (y$y - x$y)^2). The single value from the x data frame will be "recycled" for every row of y. — Gregor Thomas
@GregorThomas Thanks a lot for your quick response! I've recently started using R and looking for solutions without knowing what to look for exactly and where can be a tedious task... I'd have been able to solve this in Excel within a couple of minutes and I've done so to check whether my intended "strategy" works out or not. However, as I need to calculate the distance for many instances, having a code in R which is replicated seems to be the smarter way in the long-term. I'll take some screenshots for you so that you get a better impression of what I'm trying to do... — MSC
Please, share data, attempts, and desired results as text, not screenshots. I can't demonstrate a solution on screenshots. But if you use dput() I can copy/paste your data into my R session, develop a solution, and show you the result. — Gregor Thomas

Gregor Thomas Gregor Thomas · Accepted Answer · 2020-10-28T16:55:25

Adapting this answer to your data:

y$dist_from_x = t(outer(
  1:nrow(x),
  1:nrow(y),
  FUN = Vectorize(function(xi,yi) dist(rbind(x[xi,],y[yi,])))
))

y
#     Feature I  Feature II Feature III  Feature IV   Feature V dist_from_x
# 1 0.044428063 0.048164651 0.045638062 0.036595442 0.038433109    1.840726
# 2 0.003263986 0.005097862 0.004229567 0.004229749 0.003626148    1.926465
# 3 0.021800069 0.027690277 0.034727373 0.032815112 0.026041496    1.871883

Since x has one row, this would be a little more efficient:

# reset definition of y (or remove the dist_from_x column)
x_expanded = x[rep(1, nrow(y)), ]
y$dist_from_x = sqrt(rowSums((x_expanded - y)^2))
# same result as above

Calculate euclidean distance between multiple vectors in R

1 Answers