1
votes

I have 2 data frames (df1 and df2) that consist of three columns; x co-ordinate, y co-ordinate, category (with 5 levels A-E). So I essentially have 2 sets of points data with each point being assigned to a category

e.g.

X    Y    Cat
1    1.5  A
2    1.5  B
3.3  1.9  C

etc... (although both of my data frames have 100s of points in them)

I would like to find the nearest neighbour of the same category for each point in my first data frame (df1) from the second data frame (df2).

I've used nncross in the package spatstat to find the nearest neighbour for each point in df1 with df2, and then to list out each of these distances, as follows;

# Convert the dataframes to ppp objects

df1.ppp <- ppp(df1$X,df1$Y,c(0,10),c(0,10),marks=df1$Cat)
df2.ppp <- ppp(df2$X,df2$Y,c(0,10),c(0,10),marks=df2$Cat)

# Produce anfrom output that lists the distance from each point in df1 to its nearest neighbour in df2

out<-nncross(X=df1.ppp,Y=df2.ppp,what=c("dist","which"))

But I am struggling to work out how I use the category labels stored in the ppp objects (as defined by marks) to find the nearest neighbour from the same category. I am sure it should be fairly straight forward but if anyone has any suggestions or any alternative methods to achieve the same result I would be really grateful.

2

2 Answers

0
votes

First some artificial data to work with:

library(spatstat)

# Artificial data similar to the question
set.seed(42)
X1 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))
X2 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))

Then a simple solution (but it loses id info):

# Separate patterns for each type:
X1list <- split(X1)
X2list <- split(X2)

# For each point in X1 find nearest neighbour of same type in X2:
out <- list()
for(i in 1:5){
  out[[i]] <- nncross(X1list[[i]], X2list[[i]], what=c("dist","which"))
}

Finally, an ugly solution which recovers the id of the neighbour:

# Make separate marks for pattern 1 and 2 and collect into one pattern
marks(X1) <- factor(paste0(marks(X1), "1"))
marks(X2) <- factor(paste0(marks(X2), "2"))
X <- superimpose(X1, X2)

# For each point get the nearest neighbour of each type from both X1 and X2
# (both dist and index)
nnd <- nndist(X, by = marks(X))
nnw <- nnwhich(X, by = marks(X))

# Type to look for. I.e. the mark with 1 and 2 swapped
# (with 0 as intermediate step)
type <- marks(X)
type <- gsub("1", "0", type)
type <- gsub("2", "1", type)
type <- gsub("0", "2", type)

# Result
rslt <- cbind(as.data.frame(X), dist = 0, which = 0)
for(i in 1:nrow(rslt)){
  rslt$dist[i] <- nnd[i, type[i]]
  rslt$which[i] <- nnw[i, type[i]]
}

# Separate results
rslt1 <- rslt[1:npoints(X1),]
rslt2 <- rslt[npoints(X1) + 1:npoints(X2),]
rslt1$which <- rslt1$which - npoints(X1)
0
votes

I also had another go at tacking this but by using the package geosphere to create a distance matrix from my original data frames and found quite a simple way to solve this.

# load geosphere library 
library("geosphere")

#create a distance matrix between all points in the 2 dataframes
dist<-distm(df1[,c('X','Y')],df2[,c('X','Y')])

# find the nearest neighbour to each point
df1$nearestneighbor <- apply(dist,1,min)

# create a distance matrix where only the distances between points of the same category are recorded
sameCat <- outer(df1$Cat, df2$Cat, "!=")
dist2 <- dist + ifelse(sameCat, Inf, 0)

# find the nearest neighbour of the same category
df1$closestmatch <- apply(dist2,1,min)