0
votes

I have 2 data sets of cells (each set has multiple rows (individual cells) with x,y coordinates as columns) I want to find the smallest distance for every cell in data set A to any cell in data set B.

examples DSA = 0,0 0,1 1,0 DSB set B = 2,2

to find distance (d) from cells in A to B I did this

ax <- DS1$X
ay <- DS1$Y
bx <- DS2$X
by <- DS2$Y
D <- c(sqrt((ax-bx)^2 + (ay-by)^2))
D
[1] 2.828427 2.236068 2.236068

So it did give me what I needed, however I am having problems if not DSB has multiple points

Do I need to added a loop so that it tries all DSA values by all BSB values?

As it stands it will do the first point in DSA by only the first point in DSB, then the second value of DSA by only the second values of DSB. I want it to do the first value of DSA by all values of DSB and then return only the smallest of those 2 numbers and keep repeating through all values of DSA.

2

2 Answers

0
votes

An easy way to do this is using the dist function. If you combine data.frames using the rbind function, dist returns a matrix with pairwise distances. Here's a toy example that I created.

set.seed(10101)
df1 <- data.frame(x=rnorm(9, 1), y=rnorm(9,-1))
df2 <- data.frame(x=rnorm(10, 1,), y=rnorm(10,1))
distances <- as.matrix(dist(rbind(df1, df2)))

I'll find the nearest point in df1 to each point in df2. We only want to consider the lower block of the matrix, so we need to calculate the proper row and column indices to search over.

row.start <- nrow(df1)+1
row.end <- nrow(df1) + nrow(df2)
col.start <- 1
col.end <- nrow(df1)

We can now use the apply function to find the smallest distance in each row. We can accomplish that using the following line of code.

apply(distances[row.start:row.end, col.start:col.end], 1, which.min)
0
votes

@jrd sorry for the formatting issues I am new to stack overflow. i am importing a csv file for both DSA and DSB (headers are X and Y for the files)

DSA
  X Y
1 0 0
2 0 1
3 1 0

DSB
  X Y
1 2 2
2 7 7

df1<-data.frame(DSA[,1:2)
df2<-data.frame(DSB[,1:2)
distances<-as.matrix(dist(rbind(df1,df2)))

the distance matrix gives me this

> distances
         1        2        3        4        5
1 0.000000 1.000000 1.000000 2.828427 9.899495
2 1.000000 0.000000 1.414214 2.236068 9.219544
3 1.000000 1.414214 0.000000 2.236068 9.219544
4 2.828427 2.236068 2.236068 0.000000 7.071068
5 9.899495 9.219544 9.219544 7.071068 0.000000

from what I gather this is a matrix from every point to each other (even points with in the same data set. I just want from DSA to DSB it should look like

distances
      1        2
1  2.828427 9.899495
2  2.236068 9.219544
3  2.236068 9.219544

not sure how to just graph this portion of the matrix. the rest of the code I used was

row.start<-nrow(df1)+1
row.end<-nrow(df1)+nrow(df2)
col.start<-1
col.end<-nrow(df1)
apply(distances[row.start:row.end, col.start:col.end],1,which.min)

yields 4 5 and 2 2. my hand calculations I should get DSA row 1 = 2.828 (closest point in DSB= 2,2), row 2 = 2.236 (closest point in DSB= 2,2) row 3=2.236 (closest point in DSB= 2,2) at the end it should hopefully yield something like this where D is the shortest difference for each point of DSA to the nearest DSB point

      X Y D
    1 0 0 2.828
    2 0 1 2.236
    3 1 0 2.236