Keeping identifiers when calculating pointDistance between two data sets?

Question

G'day All,

I have two sets of point data, one with 24 locations and the other with ~16,000. I want to calculate the distance from each of the 24 points to the 16,000 points. This is easy to do with pointDistance() in R raster package, but I can't tell which pairs are being compared. I would like to create a data.frame which has the location names for each comparison so I can combine it with the distances.

a <- data.frame('lon' = c(1,5,55,31), 'lat' = c(3,7,20,22), 'loc' = c('a', 'b', 'c', 'd'))
b <- data.frame('lon' = c(4,2,8,65), 'lat' = c(50,-90,20,32), 'loc' = c('e', 'f', 'g', 'h'))   
dist <- function(x, y){
 for( i in 1:length(a$lon)){
    my_vector <- vector(mode = "numeric", length = 0)
    d <- pointDistance(cbind(x[i,'lon'], x[i,'lat']), cbind(y$lon, y$lat), lonlat=TRUE)
    my_vector <- c(my_vector, d)
    }
    my_vector
 }

I am having a very slow morning and can't figure out why my function above is not outputting each combination. It isn't adding each iteration of d to my_vector.

At the same time I would like to include the pairwise combination of locations that produced each distance measure, eg:

  loc1   loc2   dist
     a      e     10
     a      f     16
     a      g     12
     a      h     19
     b      e     15
     b      f     17
     b      g     14
     b      h     13
     c      e     11
   etc    etc    etc

I am sorry to bother you with this, any help would be really appreciated.

Thanks in advance.

Adam

IRTFM IRTFM · Accepted Answer · 2014-08-21T00:08:44

You are over-writing the result with a zero-length vector with every iteration. Declare your variable outside the loop:

dist <- function(x, y){my_vector <- vector(mode = "numeric", length = 0)
 for( i in 1:length(a$lon)){
    d <- pointDistance(cbind(x[i,'lon'], x[i,'lat']), 
                       cbind(y$lon, y$lat), lonlat=TRUE)
    my_vector <- rbind(my_vector, d)
    }
    my_vector
 }
> dist(a,b)
     [,1]     [,2]    [,3]    [,4]
d 5239684 10352713 2039490 7401111
d 4787647 10797991 1482960 6785859
d 5571477 12245144 4899364 1666956
d 3909092 12467783 2398381 3534050

I think rbind is a better "binder" in this case than is c()

Keeping identifiers when calculating pointDistance between two data sets?

2 Answers