Find nearest neighbour of points with the same value when comparing 2 different data sets in R

Question

I have 2 data frames (df1 and df2) that consist of three columns; x co-ordinate, y co-ordinate, category (with 5 levels A-E). So I essentially have 2 sets of points data with each point being assigned to a category

e.g.

X    Y    Cat
1    1.5  A
2    1.5  B
3.3  1.9  C

etc... (although both of my data frames have 100s of points in them)

I would like to find the nearest neighbour of the same category for each point in my first data frame (df1) from the second data frame (df2).

I've used nncross in the package spatstat to find the nearest neighbour for each point in df1 with df2, and then to list out each of these distances, as follows;

# Convert the dataframes to ppp objects

df1.ppp <- ppp(df1$X,df1$Y,c(0,10),c(0,10),marks=df1$Cat)
df2.ppp <- ppp(df2$X,df2$Y,c(0,10),c(0,10),marks=df2$Cat)

# Produce anfrom output that lists the distance from each point in df1 to its nearest neighbour in df2

out<-nncross(X=df1.ppp,Y=df2.ppp,what=c("dist","which"))

But I am struggling to work out how I use the category labels stored in the ppp objects (as defined by marks) to find the nearest neighbour from the same category. I am sure it should be fairly straight forward but if anyone has any suggestions or any alternative methods to achieve the same result I would be really grateful.

Ege Rubak Ege Rubak · Accepted Answer · 2016-02-04T22:13:39

First some artificial data to work with:

library(spatstat)

# Artificial data similar to the question
set.seed(42)
X1 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))
X2 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))

Then a simple solution (but it loses id info):

# Separate patterns for each type:
X1list <- split(X1)
X2list <- split(X2)

# For each point in X1 find nearest neighbour of same type in X2:
out <- list()
for(i in 1:5){
  out[[i]] <- nncross(X1list[[i]], X2list[[i]], what=c("dist","which"))
}

Finally, an ugly solution which recovers the id of the neighbour:

# Make separate marks for pattern 1 and 2 and collect into one pattern
marks(X1) <- factor(paste0(marks(X1), "1"))
marks(X2) <- factor(paste0(marks(X2), "2"))
X <- superimpose(X1, X2)

# For each point get the nearest neighbour of each type from both X1 and X2
# (both dist and index)
nnd <- nndist(X, by = marks(X))
nnw <- nnwhich(X, by = marks(X))

# Type to look for. I.e. the mark with 1 and 2 swapped
# (with 0 as intermediate step)
type <- marks(X)
type <- gsub("1", "0", type)
type <- gsub("2", "1", type)
type <- gsub("0", "2", type)

# Result
rslt <- cbind(as.data.frame(X), dist = 0, which = 0)
for(i in 1:nrow(rslt)){
  rslt$dist[i] <- nnd[i, type[i]]
  rslt$which[i] <- nnw[i, type[i]]
}

# Separate results
rslt1 <- rslt[1:npoints(X1),]
rslt2 <- rslt[npoints(X1) + 1:npoints(X2),]
rslt1$which <- rslt1$which - npoints(X1)

Find nearest neighbour of points with the same value when comparing 2 different data sets in R

2 Answers