0
votes

I'm just beginning to work with R and spatial analysis, so there is a lot of basic knowledge I'm missing!

My data contains addresses of schools. For every school, I want to calculate the distance to the nearest school. My data also contains information about "special features" of the schools. This variable (sps) is coded 1 ("with special features") or 0 ("without special features"). I want to calculate distances from schools "with special features" to schools "without special features". This is how my data looks like:

head(data01)
 id     lon      lat         sps
1 11725  6.932546 50.38269    0
2 11739  6.975160 50.48649    1
3 26883  6.987575 50.50857    0

So far, I managed to calculate the distance to the nearest school using the st_distance commend and the following code. Unfortunately, when using this code, in some cases the nearest schools both have "special features". I only want the distance from 1 -> 0 or 0 ->1 and not 1->1!

my_sf <- st_as_sf(data01,
                  coords = c("lon", "lat"), # x, y (order matters)
                  crs = 4326)

dist.mat <- st_distance(my_sf) # Great Circle distance since in lat/lon
# Number within 1.5km: Subtract 1 to exclude the point itself
num.1500 <- apply(dist.mat, 1, function(x) {
  sum(x < 1500) - 1
})

# Calculate nearest distance
nn.dist <- apply(dist.mat, 1, function(x) {
  return(sort(x, partial = 2)[2])
})
# Get index for nearest distance
nn.index <- apply(dist.mat, 1, function(x) { order(x, decreasing=F)[2] })

n.data <- data01
colnames(n.data)[1] <- "neighbor"
colnames(n.data)[2:ncol(n.data)] <- 
  paste0("n.", colnames(n.data)[2:ncol(n.data)])
mydata2 <- data.frame(data01,
                      n.data[nn.index, ],
                      n.distance = nn.dist,
                      radius1500 = num.1500)
rownames(mydata2) <- seq(nrow(mydata2))

Thanks for helping!!

cheers, k

edit:

My final dataset should look like this:

 head(data01)                                                                                                            
 id     lon      lat        sps     dist
 11725  6.932546 50.38269    0    xxxx
 11739  6.975160 50.48649    1    xxxx
 26883  6.987575 50.50857    0    xxxx                                    

Dist would be the distance to the next school (0-> 1 or 1 -> 0)

1

1 Answers

0
votes

Formula for distance between two points in 2D euclidean space (pythagoras):

sqrt( (x1-x2)**2 + (y1-y2)**2)

we can create a function to compute it (you can call library(magrittr) for the %>% operator ):

dist = function(x1,x2,y1,y2){ 
   sqrt( (x1-x2)**2 + (y1-y2)**2) %>% return()
}

create a condition vector ( Not mandatory, but it will make the code more readable. You can just inline the condition if you know how to)

condition = data01$sps %>% as.logical()

Now we can use lapply:

    lapply(1:nrow(data01), function(x){
                            if(condition[x]){
                                dist(data01$lon[!condition],
                                     data01$lon[x],
                                     data01$lat[!condition],
                                     data01$lat[x])
                            }else{
                                dist(data01$lon[condition],
                                     data01$lon[x],
                                     data01$lat[condition],
                                     data01$lat[x])
                            }
     }) 

The code is untested, but it should produce a list of vectors, each vector should give the distance to the schools with opposing sps values. First vector will correspond to the distances of all others to it etc...

Also this solution preserves the order of neighbors, such that the first distance in the vector will always correspond to the first neighbor from the opposite sps group.

For the question of what to do with the list of vectors, of course you can save it as a list to a variable called distList:

distList = lapply(....
                                                     )

You can use it to easily create a distance matrix:

we want the names of the columns and rows to be the id's, lets first separate the id's into two vectors where one has sps == 0 and the other has sps == 1, luckily we made a good bet by creating ourcondition vector :

 Sps = data01$id[condition] %>% as.character
 Nsps = data01$id[!condition] %>% as.character

Now we create our matrix,

matrix(nrow = sum(condition),ncol = sum(!condition))

We don't really need all of the data inside in our distList, so we only select either the ones where sps == 1 or the ones where sps == 0, its more convenient to go with 0, we overwrite our list by converting it into a matrix :

distList = distList[!condition] %>% 
                                   do.call(what = "cbind")

Now we name our columns and rows as school id ( note that the fact we kept order with our initial list helps us alot here):

rownames(distList) = Sps
colnames(distList) = Nsps

And thats it... now you should be able to query any pair of id's. for example: distList["11725","11739"] should give you the distance between school 11725 and school 11739.

edit #2: to find the closes sps == 0 school to any sps == 1 school, you can do the following:

 distList  = cbind(distList,
                   apply(distList,1,function(x){
                                            x[which.min(x)]
                                                })