Nearest Neighbor (distance between coordinate pairs)

Question

I have 2 data sets of cells (each set has multiple rows (individual cells) with x,y coordinates as columns) I want to find the smallest distance for every cell in data set A to any cell in data set B.

examples DSA = 0,0 0,1 1,0 DSB set B = 2,2

to find distance (d) from cells in A to B I did this

ax <- DS1$X
ay <- DS1$Y
bx <- DS2$X
by <- DS2$Y
D <- c(sqrt((ax-bx)^2 + (ay-by)^2))
D
[1] 2.828427 2.236068 2.236068

So it did give me what I needed, however I am having problems if not DSB has multiple points

Do I need to added a loop so that it tries all DSA values by all BSB values?

As it stands it will do the first point in DSA by only the first point in DSB, then the second value of DSA by only the second values of DSB. I want it to do the first value of DSA by all values of DSB and then return only the smallest of those 2 numbers and keep repeating through all values of DSA.

jrd jrd · Accepted Answer · 2016-06-28T19:04:48

An easy way to do this is using the dist function. If you combine data.frames using the rbind function, dist returns a matrix with pairwise distances. Here's a toy example that I created.

set.seed(10101)
df1 <- data.frame(x=rnorm(9, 1), y=rnorm(9,-1))
df2 <- data.frame(x=rnorm(10, 1,), y=rnorm(10,1))
distances <- as.matrix(dist(rbind(df1, df2)))

I'll find the nearest point in df1 to each point in df2. We only want to consider the lower block of the matrix, so we need to calculate the proper row and column indices to search over.

row.start <- nrow(df1)+1
row.end <- nrow(df1) + nrow(df2)
col.start <- 1
col.end <- nrow(df1)

We can now use the apply function to find the smallest distance in each row. We can accomplish that using the following line of code.

apply(distances[row.start:row.end, col.start:col.end], 1, which.min)

Nearest Neighbor (distance between coordinate pairs)

2 Answers